Windows Azure Storage: Overview, Internals, and Best Practices


Published on

Session prepared for Sql Saturday Bulgaria

Published in: Technology, Business

Windows Azure Storage: Overview, Internals, and Best Practices

  1. 1. Windows Azure Storage Overview, Internals and Best Practices
  2. 2. Sponsors
  3. 3. About me        Program Manager @ Edgar Online, RRD Windows Azure MVP Co-organizer of Odessa .NET User Group Ukrainian IT Awards 2013 Winner – Software Engineering
  4. 4. What is Windows Azure Storage?
  5. 5. Windows Azure Storage  Cloud Storage - Anywhere and anytime access  Blobs, Disks, Tables and Queues  Highly Durable, Available and Massively Scalable  Easily build “internet scale” applications  10 trillion stored objects  900K request/sec on average (2.3+ trillion per month)  Pay for what you use  Exposed via easy and open REST APIs  Client libraries in .NET, Java, Node.js, Python, PHP, Ruby
  6. 6. Abstractions – Blobs and Disks
  7. 7. Abstractions – Tables and Queues
  8. 8. Data centers
  9. 9. Windows Azure Data Storage Concepts Container Blobs https://<account><container> Account Table Entities https://<account><table> Queue Messages https://<account><queue>
  10. 10. How is Azure Storage used by Microsoft?
  11. 11. Internals
  12. 12. Design Goals Highly Available with Strong Consistency  Provide access to data in face of failures/partitioning Durability  Replicate data several times within and across regions Scalability  Need to scale to zettabytes  Provide a global namespace to access data around the world  Automatically scale out and load balance data to meet peak traffic demands
  13. 13. Windows Azure Storage Stamps Access blob storage via the URL: http://<account> Data access Storage Location Service LB LB Front-Ends Front-Ends Partition Layer Partition Layer Inter-stamp (Geo) replication DFS Layer DFS Layer Intra-stamp replication Intra-stamp replication Storage Stamp Storage Stamp
  14. 14. Architecture Layers inside Stamps Partition Layer Index
  15. 15. Availability with Consistency for Writing All writes are appends to the end of a log, which is an append to the last extent in the log Write Consistency across all replicas for an extent:  Appends are ordered the same across all 3 replicas for an extent (file)  Only return success if all 3 replica appends are committed to storage  When extent gets to a certain size or on write failure/LB, seal the extent’s replica set and never append anymore data to it Write Availability: To handle failures during write  Seal extent’s replica set  Append immediately to a new extent (replica set) on 3 other available nodes  Add this new extent to the end of the partition’s log (stream) Partition Layer
  16. 16. Availability with Consistency for Reading Read Consistency: Can read from any replica, since data in each replica for an extent is bit-wise identical Read Availability: Send out parallel read requests if first read is taking higher than 95% latency Partition Layer
  17. 17. Dynamic Load Balancing – Partition Layer Spreads index/transaction processing across partition servers  Master monitors traffic load/resource utilization on partition servers  Dynamically load balance partitions across servers to achieve better performance/availability  Does not move data around, only reassigns what part of the index a partition server is responsible for Partition Layer Index
  18. 18. Dynamic Load Balancing – DFS Layer DFS Read load balancing across replicas  Monitor latency/load on each node/replica; dynamically select what replica to read from and start additional reads in parallel based on 95% latency Partition Layer
  19. 19. Architecture Summary  Durability: All data stored with at least 3 replicas  Consistency: All committed data across all 3 replicas are identical  Availability: Can read from any 3 replicas; If any issues writing seal extent and continue appending to new extent  Performance/Scale: Retry based on 95% latencies; Auto scale out and load balance based on load/capacity  Additional details can be found in the SOSP paper:  “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011
  20. 20. Best Practices
  21. 21. General .NET Best Practices For Azure Storage  Disable Nagle for small messages (< 1400 b)  ServicePointManager.UseNagleAlgorithm = false;  Disable Expect 100-Continue*  ServicePointManager.Expect100Continue = false;  Increase default connection limit  ServicePointManager.DefaultConnectionLimit = 100; (Or More)  Take advantage of .Net 4.5 GC  GC performance is greatly improved  Background GC:
  22. 22. General Best Practices  Locate Storage accounts close to compute/users  Understand Account Scalability targets  Use multiple storage accounts to get more  Distribute your storage accounts across regions  Consider heating up the storage for better performance  Cache critical data sets  To get more request/sec than the account/partition targets  As a Backup data set to fall back on  Distribute load over many partitions and avoid spikes
  23. 23. General Best Practices (cont.)  Use HTTPS  Optimize what you send & receive  Blobs: Range reads, Metadata, Head Requests  Tables: Upsert, Projection, Point Queries  Queues: Update Message  Control Parallelism at the application layer  Unbounded Parallelism can lead to slow latencies and throttling  Enable Logging & Metrics on each storage service
  24. 24. Blob Best Practices  Try to match your read size with your write size  Avoid reading small ranges on blobs with large blocks  CloudBlockBlob.StreamMinimumReadSizeInBytes/ StreamWriteSizeInBytes  How do I upload a folder the fastest?  Upload multiple blobs simultaneously  How do I upload a blob the fastest?  Use parallel block upload  Concurrency (C)- Multiple workers upload different blobs  Parallelism (P) – Multiple workers upload different blocks for same blob
  25. 25. Concurrency Vs. Blob Parallelism • • • C=1, P=1 => Averaged ~ 13. 2 MB/s C=1, P=30 => Averaged ~ 50.72 MB/s C=30, P=1 => Averaged ~ 96.64 MB/s • Single TCP connection is bound by TCP rate control & RTT • P=30 vs. C=30: Test completed almost twice as fast! • Single Blob is bound by the limits of a single partition • Accessing multiple blobs concurrently scales 10000 8000 6000 4000 2000 Time (s) XL VM Uploading 512, 256MB Blobs (Total upload size = 128GB) 0
  26. 26. Blob Download  XL VM Downloading 50, 256MB Blobs (Total download size = 12.5GB) C=1, P=1 => Averaged ~ 96 MB/s C=30, P=1 => Averaged ~ 130 MB/s 120 Time (s) • • 140 100 80 60 40 20 0 C=1, P=1 C=30, P=1
  27. 27. Table Best Practices  Critical Queries: Select PartitionKey, RowKey to avoid hotspots  Table Scans are expensive – avoid them at all costs for latency sensitive scenarios  Batch: Same PartitionKey for entities that need to be updated together  Schema-less: Store multiple types in same table  Single Index – {PartitionKey, RowKey}: If needed, concatenate columns to form composite keys  Entity Locality: {PartitionKey, RowKey} determines sort order  Store related entites together to reduce IO and improve performance  Table Service Client Layer in 2.1 and 2.2: Dramatic performance improvements and better NoSQL interface
  28. 28. Queue Best Practices  Make message processing idempotent: Messages become visible if client worker fails to delete message  Benefit from Update Message: Extend visibility time based on message or save intermittent state  Message Count: Use this to scale workers  Dequeue Count: Use it to identify poison messages or validity of invisibility time used  Blobs to store large messages: Increase throughput by having larger batches  Multiple Queues: To get more than a single queue (partition) target
  29. 29. Thank you!  Q&A