Acsug scalable windows azure patterns


Published on

Windows Azure provides you with the capabilities to infinitely scale your applications, but how do you achieve this effectively and efficiently. In this session we will introduce the patterns and anti-patterns of scalability on the Windows Azure platform, demonstrating how to leverage connected systems technologies like the Azure AppFabric Service Bus to achieve scale, and an implementation of some of these patterns that demonstrates how to cost effectively scale your architectures.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • You probably have one of these guys working in your organisation or for your clientsit is important for your career and your project that you keep this guy happyThis may not be a fair interpretation in your organisation, his axe may actually be bigger
  • He doesn't want you costing the business lots of this in designing architectures with a poor cost profileIn this presentation I will explain some of the patterns of scalability available on the Azure platform and the cost benefits of specific approaches that will help you achieve scale while maximising costs
  • What are the Patterns of Scalability on the Azure Platform?I will demonstrate from examples of Scalable ImplementationsAnd outline the Costs Savings Associated with the correct design approach
  • Traditional n tiered applicationWeb tierX number of application tiersYour data tier
  • Animate this slide, start with three, scale up, highlight contention
  • synchronous processessequential units of worktight coupling betweenstatefulpessimistic concurrencyclustering for HAvertical scaling To scale, get bigger serversexpensivehas scaling limitsinefficient use of resources
  • Large sequentialunits of work take a lot longer to processSmall units of work can be executed in parallel and take less to reprocess if there is a failureAchieve efficiencies of scale by processing batches of data, usually because the overhead of an operation is amortized across multiple requestA balance between reliability against cost. With smaller units of work if they fail can be retried quickly but have cost overhead, large batches cost less but if fail will take longer to process. Ensure that you create configuration settings for this and tune appropriately Entity Group Transactions – the ability to perform an atomic transaction across up to 100 entities with any combination of Insert, Update, or Delete for Azure Tables. The requirement is that all of the entities have to be in the same table and have the same PartitionKey value and the total request size must be under 4Mbytes.  Used appropriately this enables:small logical units of workre-try-based recoverability
  • Work on the same task in parallel on multiple processing unitsUsing scaled out compute instances you can spread the load across these instances Obviously single threaded applications will end up with huge queues of work and under utilised compute resource Parallel processing will make better use of available resources and increase throughput, but ensure you test appropriately and watch performance like utilisation and contention counters
  • As systems scale the worst enemy is resource contentionSpread the system load across multiple processing units to reduce resource contention This applies to queues, table/blob stores, relational databases and compute instances
  • Spreading the load across many components with out regard to the data inside of the request according to some load balancing algorithm Azure web & worker role endpoints, stateless round robin distribution
  • Spreading the load across many components by routing an individual request to a component that owns that data specific to the request Partitioning is more intelligent and requires data to be routed to a service/resource that knows how to deal with that Comes in two flavours 4 Vertical partitioning - This is typically the first stage of scaling systems out, simple to execute as you just split your application and services functionally across nodes to reduce contention Spreading the load across the functional boundaries of a problem space, separate functions being handled by different processing units Split databases and processing across functional spaces, membership in one database, accounts in another, eventually moving each system module into its own database Vertical partitioning only works for so long, eventually you run out of functional boundaries to spread your load across and you will begin to get contention in these functionally scaled components There are scalability targets for each Azure component Storage queues are 500 messages per secondStorage tables and blobs are 5000 messages per second (per partition/account)SQL Azure not published but will start behaving very badly under excessive load
  • Spreading a single type of data element across many instances, according to some partitioning key, e.g. hashing the player id and doing a modulus operation, etc.  Quite often referred to as sharding or data partitioningYou have to relax referential integrity constraints for this to workThis is a complex approach and should only be attempted through the use of a good framework An example of this is the Federations Database library (see Chris Aulds presentation)Partitioned Cloud Queue ( )
  • Here we can see that the Front-End layer takes incoming requests, and a given front-end server can talk to all of the partition servers it needs to in order to process the incoming requests. The partition layer consists of all of the partition servers, with a master system to perform the automatic load balancing (described below) and assignments of partitions. As shown in the figure, each partition server is assigned a set of object partitions (Blobs, Entities, Queues). The Partition Master constantly monitors the overall load on each partition sever as well the individual partitions, and uses this for load balancing. Then the lowest layer of the storage architecture is the Distributed File System layer, which stores and replicates the data, and all partition servers can access any of the DFS severs.
  • To keep a data center server’s resources from being overloaded and jeopardizing the health of the entire machine, the load on each machine is monitored by the Engine Throttling component. In addition, each database replica is monitored to make sure that statistics such as log size, log write duration, CPU usage, the actual physical database size limit, and the SQL Azure user database size are all below target limits. If the limits are exceeded, the result can be that a SQL Azure database rejects reads or writes for 10 seconds at a time. Occasionally, violation of resource limits may result in the SQL Azure database permanently rejecting reads and writes (depending on the resource type in question).
  • Show the files to be reconciledRun one processShow task allocation and timingShow resultsRun 1000 processes across scale frameworkShow results, tah dah
  • Scaling out to many small serversRequests return as soon as possible from the application serversLong running processes
  • Add database costs
  • Add database costs
  • Inter-role communication utilizing Service Bus event relay and topic-based messaging
  • Chaos Monkey, a tool that randomly disables our production instances to make sure we can survive this common type of failure without any customer impact. Conformity Monkey finds instances that don’t adhere to best-practices and shuts them down. For example, we know that if we find instances that don’t belong to an auto-scaling group, that’s trouble waiting to happen. We shut them down to give the service owner the opportunity to re-launch them properly.Doctor Monkey taps into health checks that run on each instance as well as monitors other external signs of health (e.g. CPU load) to detect unhealthy instances. Once unhealthy instances are detected, they are removed from service and after giving the service owners time to root-cause the problem, are eventually terminated.Janitor Monkey ensures that our cloud environment is running free of clutter and waste. It searches for unused resources and disposes of them.Security Monkey is an extension of Conformity Monkey. It finds security violations or vulnerabilities, such as improperly configured AWS security groups, and terminates the offending instances. It also ensures that all our SSL and DRM certificates are valid and are not coming up for renewal.10-18 Monkey (short for Localization-Internationalization, or l10n-i18n) detects configuration and run time problems in instances serving customers in multiple geographic regions, using different languages and character sets.Chaos Gorilla is similar to Chaos Monkey, but simulates an outage of an entire Amazon availability zone. We want to verify that our services automatically re-balance to the functional availability zones without user-visible impact or manual intervention.
  • The Service Component Compositional Binding Model is founded on a specialized design paradigm comprising of the interface-based programming model and associated framework artifacts such as: Lightweight programming API that includes standard interface definitions, classes and methods.Compositional binding model that provides the means of registering, discovering, instantiating and managing lifetime of service components. Such a model enables the participating components to discover and consume other components at run-time. The components can be reused, extended or superseded by other components allowing composable services to become highly agile and customizable. The model provides the means of abstracting the way how components resolve each other and interoperate without having to express and lock inter-component bindings at design time.
  • Anti pattern Consider every as a blob, always send to blobs and use as a pointer on the queueThere is a potential that compressed messages go to the queue under 8kb
  • Adaptiv has built a scalable reconciliation service on top of the CAT team framework using Pipes and Filters and Scatter gather More emphasis on the business context
  • Uses a scatter approach to split up work and distribute to your processing nodesThe framework was critical as it abstracted away all the complexity of dealing with queues and other resources in simple and efficient manner Pipes and FiltersThe output of one process becomes the input of the next, allowing us to break up a complex process and chain togetherScatter GatherEach process splits its work into 1 or more tasks that are processed across a pool of compute instances
  • Show the files to be reconciledRun one processShow task allocation and timingShow resultsRun 1000 processes across scale frameworkShow results, tah dah
  • Batching - cost advantages of batching requests together versus lower levels of granularityCheck entity group transactions
  • cost benefits of utilising parallel threads versus scaling up
  • Mixing Functional Roles - cost advantages of having fully utilised compute instances The use of extensiblility model & multithreading do compose and discoverCan change at runtime
  • Compressing objects will reduce storage costsCompression occurs when writing to queue, blobs stores too
  • Acsug scalable windows azure patterns

    1. 1. Scalable Windows Azure PatternsPresented by Nikolai Blackie @ Auckland Connected Systems User GroupPrincipal ArchitectAdaptiv Integration20th of October 2011
    2. 2. Agenda Overview the patterns of scalability Overview the scale targets on Windows Azure and how to scale past these Introduce Windows Azure CAT team cloud scalability framework – CloudFx Demonstrate Adaptiv’s scaled out distributed architecture Show how you can save money when scaling on Windows Azure
    3. 3. Traditional Scalability
    4. 4. Traditional Scalability
    5. 5. Traditional Scalability Issues Synchronous sequential units of work Large units of work Tight coupling between components Stateful applications Clustering for HA Expensive Scaling limits Inefficient use of resources How should you approach this in the cloud?
    6. 6. Logical Unit Of Work
    7. 7. Parallel Units Of Work Single Threaded Multi Threaded
    8. 8. Event Driven Processing Storage Queues ► Enable event driven architectures ► Allow load levelling ► Scale target of 500 messages per second Service Bus V1 ► Low latency relay ► Non durable direct eventing Service Bus V2 ► Durable asynchronous pub-sub and queues
    9. 9. Load Distribution
    10. 10. Load Balancing / Load Sharing Stateless Round Robin Network Load Balancer Web Role Web Role Web Role Queue Worker Role Worker Role Worker Role Legend Azure Instance Unit of Work
    11. 11. Vertical Partitioning
    12. 12. Vertical Partitioning
    13. 13. Horizontal Partitioning Partitioned Cloud Queue
    14. 14. Storage Services Scalability Targets Maximum scale per account Capacity – Up to 100 TBs Transactions – Up to 5,000 entities/messages/blobs per second Bandwidth – Up to 3 gigabits per second Per storage abstraction ► Single Queue - 500 messages per second ► Single Table Partition - 500 entities per second ► Single Blob - 60 MBytes/sec When targets reached? ► 503 Server busy – transient not fatal ► Use Upsert on batch operations
    15. 15. Storage Services Scalability Targets Front-End (FE) layer Partition Layer Distributed and replicated File System (DFS)
    16. 16. Service Bus Scalability Targets Quotas per service namespace Concurrent connections – 100 Number of topics/queues - 10,000 Number of subscriptions per topic - 2,000 Maximum Queue size – 1gb – 5gb Throughput Targets Queues depending on message size much faster than storage queues Topics throughput dependent on subscription counts Official guidance is coming soon
    17. 17. Sql Azure Scalability Targets No official guidance on I/O performance Sql Azure is a multi tenant database platform Runs on commodity hardware Throttled when connection overloads Sql Azure
    18. 18. Sharding Techniques
    19. 19. Cloud Scalability – Scale Up Azure Role Cost (NZD) CPU Bandwidth Memory XSmall $0.06 Shared Core 1.0 GHz 5 Mbps 0.768 GB Small $0.14 1 Core 1.6 GHz 100 Mbps 1.750 GB Medium $0.28 2 Cores 1.6 GHz 200 Mbps 3.50 GB Large $0.55 4 Cores 1.6 GHz 400 Mbps 7.00 GB XLarge $1.10 8 Cores 1.6 GHz 800 Mbps 14.0 GB
    20. 20. Cloud Scalability – Scale Up Edition Database Cost (NZD) Web 1GB $ 11.48 Web 5GB $ 57.41 Business 10GB $ 114.93 Business 20GB $ 229.86 Business 30GB $ 344.79 Business 40GB $ 459.72 Business 50GB $ 574.66
    21. 21. Cloud Scalability Key Features Small logical units of work Parallel units of work Event driven processing Load distribution ► Load balancing ► Vertical partitioning ► Horizontal partitioning Low cost scale out Dynamic scale up
    22. 22. Scalable Frameworks Windows Azure CAT Team Azure CloudFx Reference Library & Implementation
    23. 23. CloudFx Framework Features► CloudFx is a cloud solution framework and extensions library► A Swiss army knife for building scalable systems ► Service agnostic retries ► Large message queue ► Payload compression ► And much more….
    24. 24. Cost-Efficient Queue Listener Anti-Pattern continuous queue polling when idle Notify of new work Provision parallel dequeue threads & increase poll rate When no work reduce threads & polling rate Very efficient event based processing
    25. 25. Reliable Retry Framework Anti-Pattern failing catastrophically or rerunning entire processes due to service call failure Call to resource fails Retry based on configured pattern ► Fixed, Incremental, Exponential Ensure Idempotent operations
    26. 26. Reliable Retry Framework Anti-Pattern failing catastrophically or rerunning entire processes due to service call failure Call to resource fails Retry based on configured pattern ► Fixed, Incremental, Exponential Ensure Idempotent operations Expect failure, design for fault tolerance ► Netflix Simian Army ► CloudFx unit tests
    27. 27. Service Aggregation Anti-Pattern tightly coupled service deployments Enable flexible service aggregation Implemented using System.ServiceModel Extensions ► IExtensible<T> ► IExtensibleObject<T> Consolidate services or partition vertically
    28. 28. Large Message Queue Support Anti-Pattern writing custom code to handle queue storage limits of 64kb message size Abstracted support for messages of any size Built in compression when writing to all services Utilises DeflateStream Compress on write, decompress on read
    29. 29. Scalable Reconciliation Implementation
    30. 30. Scalable Reconciliation Implementation
    31. 31. Scalable Reconciliation Implementation
    32. 32. Reconciliation Scale Out Demo
    33. 33. Cost Capacity Planning Storage Transactions ► Queues 1 transaction to put, 2 transactions to get and delete Can batch up to 32 get message operations in 1 transaction ► Table storage 1 transaction for read/write Can batch 100 entity transactions is into 1 group transaction ► Blobs 1 transaction for read/write Bandwidth ► Measured from outside and between data centers ► Free inbound data
    34. 34. Cost Capacity Planning Storage ► Cumulative total through month, charged on average usage Compute Instances ► Charged as soon as virtual instance allocated regardless of running state ► Billed to the nearest hour Measurement ► Instrument Azure service access ► Use billing manager with A & B testing approach
    35. 35. Cost-Efficient Queue Listener Idle Polling * Polling Algorithm Transactions Per Day Service Bus Cost Per Day Standard One Dequeue Thread 79,200 $ - $ 0.09 10 Compute Instances - 150 Dequeue Threads 11,880,000 $ - $ 13.66 Back off One Dequeue Thread - $ 4.59 $ 0.15 10 Compute Instances - 150 Dequeue Threads - $ 22.87 $ 0.76 * 22 hours per day Running Polling ** Polling Algorithm Transactions Per Day Cost Per Day One Dequeue Thread 72,000 $ 0.08 10 Compute Instances - 150 Dequeue Threads 10,800,000 $ 12.41 ** 2 hours per day, 5 msgs per sec, 2 trans per message Savings Polling Algorithm Transactions Per Day Cost Per Day One Dequeue Thread 79,200 $ (0.06) 10 Compute Instances - 150 Dequeue Threads 11880000 $ 12.89
    36. 36. Cost-Efficient Queue Listener * Terms and conditions apply, your mileage may vary, these calculations are based on simple models
    37. 37. Batching Units of Work Parsing a 10000 line file ► 10000 messages a day ► Table storage batch transactions ► Using scatter scale out pattern Batch Size Queue Transactions Table Transactions Total Transactions Cost Per Day10 Lines 30,000,000 10,000,000 40,000,000 $45.98200 Lines 1,500,000 500,000 2,00,000 $ 2.302000 Lines 150,000 50,000 200,000 $0.23
    38. 38. Batching Units of Work * Terms and conditions apply, your mileage may vary, these calculations are based on simple models
    39. 39. Parallel Processing & Mixing Roles Per Day Per DayUtilisation Cost Utilisation Cost5 Threads Dequeuing across 5 instances $ 16.55 One Service over 3 instances $ 9.9325 Threads Dequeuing across 1 Instance $ 3.31 Three Services on one instance $ 3.31 * Terms and conditions apply, your mileage may vary, these calculations are based on simple models
    40. 40. Object Compression Compression Ratios ► Compressing a text based CSV 5:1 ratio ► Compressing a Xml file 10:1 ratio Based on 10000 CSV messages a day 10000 Line CSV File Blob Storage Space (GB) Cost Per Day Uncompressed 2064KB 19.68 $3.39 Compressed 390KB 3.72 $0.64 Savings 15.96 $2.75
    41. 41. Object Compression * Terms and conditions apply, your mileage may vary, these calculations are based on simple models
    42. 42. To Summarise You can scale with any combination of up and out using horizontal and vertical partitioning On the cloud make the most of the ability to scale out using small units of work Distribute load to reduce resource contention Make the best use of the resources you have paid for Use frameworks like CloudFx and to help you scale using best practices Use techniques like back off polling, batching, parallel processing and compression to reduce costs
    43. 43. Links Windows Azure Capacity Assessment Building Highly Scalable Java Applications on Windows Azure Cost Architecting for Windows Azure Understanding Windows Azure Storage Billing – Bandwidth, Transactions, and Capacity Operational costs of an Azure Message Queue
    44. 44. Links Windows Azure Storage Abstractions and their Scalability Targets Understanding the Scalability, Availability, Durability, and Billing of Windows Azure Storage Windows Azure Storage Architecture Overview Windows Azure AppFabric Service Bus Quotas Inside SQL Azure The Netflix Simian Army
    45. 45. Windows Azure CAT Links How to Simplify & Scale Inter-Role Communication Using Windows Azure AppFabric Service Bus Implementing Storage Abstraction Layer to Support Very Large Messages in Windows Azure Queues Transient Fault Handling Framework for SQL Azure, Windows Azure Storage, AppFabric Service Bus Best Practices for Maximizing Scalability and Cost Effectiveness of Queue- Based Messaging Solutions on Windows Azure
    46. 46. We would like to thank the sponsors. . . Premier Partners Associated Partners Supporting Partners: