Windows azure table storage – deep dive


Published on

This is a deep dive sesison on Windows Azure Table Storage given by Sundararajan S - in B.Net DevCon

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Windows azure table storage – deep dive

  1. 1. Windows Azure Table Storage – Deep Dive<br />Sundararajan Subramanian<br />Associate Technical Architect<br />
  2. 2. Azure - Storage<br />Tables – Provide structured storage. A Table is a set of entities, which contain a set of properties<br />Queues – Provide reliable storage and delivery of messages for an application<br />Blobs – Provide a simple interface for storing named files along with metadata for the file<br />Drives – Provides durable NTFS volumes for Windows Azure applications to use<br />2<br />
  3. 3. Windows Azure Tables<br />Provides Structured Storage<br />Massively Scalable Tables<br />Billions of entities (rows) and TBs of data<br />Can use thousands of servers as traffic grows<br />Highly Available & Durable<br />Data is replicated several times<br />Familiar and Easy to use API<br />ADO.NET Data Services – .NET 3.5 SP1<br />.NET classes and LINQ<br />REST – with any platform or language<br />3<br />
  4. 4. Table Storage Concepts<br />Entities<br />Tables<br />Accounts<br />Blogtitle=…<br />Name = …<br />BlogPosts<br />Blogtitle=…<br />Name = …<br />sundars<br />Name=…<br />Id= …<br />Users<br />Name=…<br />Id= …<br />4<br />
  5. 5. Table Data Model<br />Table<br />A storage account can create many tables<br />Table name is scoped by account<br />Set of entities (i.e. rows)<br />Entity<br />Set of properties (columns)<br />Required properties<br />PartitionKey, RowKey and Timestamp<br />5<br />
  6. 6. Required Entity Properties<br />PartitionKey & RowKey<br />Uniquely identifies an entity<br />Defines the sort order<br />Use them to scale your application<br />Timestamp <br />Read only<br />Optimistic Concurrency<br />6<br />
  7. 7. PartitionKey And Partitions<br />PartitionKey<br />Used to group entities in the table into partitions<br />A table partition <br />All entities with same partition key value<br />Unit of scale<br />Control entity locality<br />Row key provides uniqueness within a partition<br />7<br />
  8. 8. Partitioning Tables<br />Partition 1<br />Partition 2<br />
  9. 9. Partitioning – Why? <br /><ul><li>Scalability</li></ul>Each individual partitions are distributed across multiple storage nodes<br />System monitors the Partition usage and automatically balances partitions across multiple storage nodes.<br />Apartition i.e. all entities with same partition key, will be served by a single node<br /><ul><li>Entity Group Transactions</li></ul>Allows the application to atomically perform multiple Create/Update/Delete operations across multiple entities in a single batch request to the storage system<br /><ul><li>Entity Locality</li></li></ul><li>Chosing Partition Key<br /><ul><li>Entity Group transactions
  10. 10. Efficient queries
  11. 11. Scalability</li></li></ul><li>Table Operations<br />Table<br />Create<br />Query<br />Delete<br />Entities<br />Insert<br />Update <br />Merge – Partial Update<br />SaveChanges()<br />Replace – Update entire entity SaveChanges(SaveChangesOptions.ReplaceOnUpdate) <br />Delete<br />Query<br />Entity Group Transaction<br />
  12. 12. Demo<br />Blog Engine – Windows Azure Table Storage<br />
  13. 13. DataService Context – Best Practices<br /><ul><li>Do not share the dataservicecontext object across threads
  14. 14. Maintain shorter lifetimes
  15. 15. Use separate Dataservice Context object for each operation</li></ul>If dataservice context object is shared across multiple operations the error cause in one of the operation will be retried during the subsequent SaveChanges.<br /><ul><li>Entity Class name and the Table name should be same for high performance</li></li></ul><li>Concurrent updates<br /><ul><li>With each result set, Etags are sent
  16. 16. When an update happens to the retrieved entity, the client sends the Etag back to the server.
  17. 17. Server checks for the Etag of the persisted entity before update.
  18. 18. If there is a mismatch, server throws an exception</li></li></ul><li>Query speed<br /><ul><li>FAST</li></ul>Single partitionkey and rowkey with equality<br /><ul><li>MEDIUM</li></ul>Single partition but a small range for RowKey<br />Entire partition or table that is small<br /><ul><li>SLOW</li></ul>Large single scan<br />Large table scan<br />“OR” predicates on keys => no query optimization => results in scan<br /><ul><li>Expect Continuation Tokens</li></li></ul><li>Make Queries FASTER<br />Large Scans<br />Split the range and parallelize queries<br />Create and maintain own views that help queries<br />“Or” Predicates<br />Execute individual query in parallel instead of using “OR”<br />User Interactive<br />Cache the result to reduce scan frequency<br />
  19. 19. Continuation tokens<br /><ul><li>Maximum of 1000 rows in a response
  20. 20. At the end of partition range boundary
  21. 21. Maximum of 5 seconds to execute the query
  22. 22. Expect Continuation token always</li></ul>If the Query times out, Server returns a continuation token so that the client can make another query<br />When the Scan crosses partition boundary, continuation tokens are returned<br />
  23. 23. Pagination <br /><ul><li>Use Iqueryable<>.Take(N) to fetch the top results
  24. 24. Use continuation Tokens</li></ul>http://<serviceUri>/Blogs?<originalQuery>&NextPartitonKey=<someValue>&NextRowKey=<someOtherValue><br />
  25. 25. Tips and Tricks<br />Windows Azure Table Storage<br />
  26. 26. Retrieve Latest Items<br />Have the row key as <br />DateTime.MaxValue.Ticks- DateTime.UtcNow.Ticks<br />
  27. 27. Prefix based Retrieval<br /><ul><li>Use CompareTo and ‘>’ and ‘<‘ function effectively
  28. 28. blog.PartitionKey.CompareTo(“Mic”)>=0</li></li></ul><li>Q&A<br />
  29. 29.<br /> <br /><br /><br />Thank you<br />