MS Cloud Day - Building web applications with Azure storage

  • 2,614 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,614
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
30
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Slide ObjectivesUnderstand TablesSpeaker NotesWithin a storage account, a developer may create named tables. Tables store data as entities. An entity is a collection of named properties and their values, similar to a row. Tables are partitioned to support load balancing across storage nodes. Each table has as its first property a partition key that specifies the partition an entity belongs to. The second property is a row key that identifies an entity within a given partition. The combination of the partition key and the row key forms a primary key that identifies each entity uniquely within the table.The Table service does not enforce any schema. A developer may choose to implement and enforce a schema on the client sideNoteshttp://msdn.microsoft.com/en-us/library/dd573356.aspx
  • Slide ObjectivesUnderstand Tables and EntitiesSpeaker NotesTables store data as entities. An entity is a collection of named properties and their values, similar to a row- not an RDBMS thoughTables are partitioned to support load balancing across storage nodes. Each table has as its first property a partition key that specifies the partition an entity belongs to. The second property is a row key that identifies an entity within a given partition. The combination of the partition key and the row key forms a primary key that identifies each entity uniquely within the table.The Table service does not enforce any schema. A developer may choose to implement and enforce a schema on the client sideNoteshttp://msdn.microsoft.com/en-us/library/dd573356.aspxhttp://msdn.microsoft.com/en-us/library/dd179338.aspx
  • Slide ObjectivesUnderstand Flexible EntitiesSpeaker NotesTables store data as entities. A table can contain entities of any shapeThere is no fixed schemaThere is no schema checkingThere is no strong typing- not that Birthdate is stored as both a datetime value and as a stringNot that we can add additional columnsNoteshttp://msdn.microsoft.com/en-us/library/dd573356.aspx
  • Slide ObjectivesUnderstand The Basic Query SyntaxSpeaker NotesTables store data as entities. Querying is per the ADO.NET Data Services spechttp://msdn.microsoft.com/en-us/library/cc668784.aspxShould endeavour to always include the Partition key to limit scope of query- partitions always served by a single storage nodeNoteshttp://msdn.microsoft.com/en-us/library/dd573356.aspx
  • Slide ObjectivesUnderstand The Partition KeySpeaker NotesTables are partitioned to support load balancing across storage nodes. A table's entities are organized by partition. A partition is a consecutive range of entities possessing the same partition key value. The partition key is a unique identifier for the partition within a given table, specified by the PartitionKey property. The partition key forms the first part of an entity's unique identifier within the table.The partition key may be a string value up to 1 KB in size.You must include the PartitionKey property in every insert, update, and delete operation.Noteshttp://msdn.microsoft.com/en-us/library/dd573356.aspxhttp://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/07/understanding-the-scalability-availability-durability-and-billing-of-windows-azure-storage.aspx http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx
  • Slide ObjectivesUnderstand The Partition KeySpeaker NotesTables are partitioned to support load balancing across storage nodes. A table's entities are organized by partition. A partition is a consecutive range of entities possessing the same partition key value. The partition key is a unique identifier for the partition within a given table, specified by the PartitionKey property. The partition key forms the first part of an entity's unique identifier within the table.The partition key may be a string value up to 1 KB in size.You must include the PartitionKey property in every insert, update, and delete operation.Noteshttp://msdn.microsoft.com/en-us/library/dd573356.aspxhttp://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/07/understanding-the-scalability-availability-durability-and-billing-of-windows-azure-storage.aspx http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx

Transcript

  • 1. Microsoft Cloud DayK.Mohamed Faizal Lead Consultant, NCS Pte Ltd Building web applications with Azure Storage Faizal has over a decade of experience in Information Technology with a focus on enabling portals, Internet & Intranet application development. In this session you will learn storage capabilities of Windows Azure, Blobs, Tables and Queues. Discover how to create storage accounts; upload and retrieve blobs and blob metadata; create, update and query tables; and create a simple service that uses a message queue for communication.
  • 2. CL201: Building WebApplication with Azure Storage K.Mohamed Faizal, Lead Consultant @ NCS (P) Ltd. 27th April 2011
  • 3. About Me – K Mohamed Faizal
  • 4. Agenda Windows Azure Storage Blobs, Tables, Queues Scalability – Best Practices & Tips Q&A
  • 5. In General Web applications Relational database ( SQL Server) it’s very difficult to design scalable SQL Server at low cost.
  • 6. Common Consideration Do you have enough space to store all the files you need? How do you add more storage capacity? If a disk crashes, where does your data go? Is the storage block load balanced? What if you lose your connection to the block? Is it redundant? At what point do you max out your disk, in terms of reading and writing? How do you evenly distribute load across all disks?
  • 7. Storage Options SQL Server Network share Distributed File System (DFS) Network-attached storage (NAS) Direct-attached storage (DAS) Storage area network (SAN)
  • 8. Storage Options SQL Server High-availability technology (such as clustering, mirroring, or replication), your database server is likely to be a single point of failure in the system Network share This cheapo solution offers no redundancy and provides no ability to scale out Distributed File System (DFS) Using replication ensures that there are no single points of failure in this solution and that the data is held on multiple machines.
  • 9. Storage Options Network-attached storage (NAS) NAS devices can range from being pretty cheap to very expensive, depending on the levels of scalability, performance, and redundancy that you require from the device Direct-attached storage (DAS) Storage area network (SAN) SAN devices support replication and are highly scalable (they scale much higher than do DAS devices), fault tolerant, high performing, and incredibly expensive.
  • 10. I Need Storage Which can provide Cost effective, Scalable, Durable and, Highly Available
  • 11. What is Windows Azure Storage? Cloud Storage System Provides Scalable, Durable and Highly Available Storage System Abstractions: Blobs – Provides a simple interface for storing named files along with metadata for the file Tables – Provides structured storage. A Table is a set of entities, which contain a set of properties Queues – Provides reliable storage and delivery of messages for an application Drives – Provides durable NTFS volumes for Windows Azure applications to use
  • 12. Triplicate… Your data is replicated 3 or 4 times in their data centre
  • 13. Windows Azure and SQL Azure Azure Storage SQL AzureVision Highly scalable, highly Scalable, highly available available store in the relational store in the Cloud CloudAccess Uses WCF Data Services - SqlClient + TSQL RESTRelational? No Yes – but with some limitationsAnalogy File System RDBMS – as it is Maximum Amount of 100TB 50GB (up to Oct 2010)data in a single“database”Price per GB per month $ 0.15 $ 9.99
  • 14. Windows Azure Storage AccountUser creates a globally unique storage accountname Can choose geo-location to host storage account “US Anywhere”, “US North Central”, “US South Central”, Can co-locate storage account with compute account Receive a 256 bit secret key when creating account
  • 15. Storage Account Capacity Storage Account Capacity at Commercial Availability Each storage account can store up to 100 TB Default limit of 5 storage accounts per subscription.
  • 16. Windows Azure Data Storage Concepts Container Blobs https://<account>.blob.core.windows.net/<container> Account Table Entities https://<account>.table.core.windows.net/<table> Queue Messages https://<account>.queue.core.windows.net/<queue>
  • 17. Create Storage AccountDEMO
  • 18. What is a BLOBBinary Large OBject
  • 19. Current Storage Solutions SQL Servers Challenges with cost, performance and backup Your database size may grow very big File System Storage Load balance? Cost?
  • 20. Type of Blob Block Blob Page Blob
  • 21. Windows Azure Blobs Types Block blobs Targeted at streaming workloads Each blob consists of a sequence of blocks 2 Phase commit: Blocks are uploaded and then separately committed Efficient continuation and retry Send multiple out of order blocks in parallel and decide the block order during commit Random range reads possible Size limit 200GB per blob
  • 22. Block blobs File has variable Local file sized blocks Upload blocks in Block 1 Block 2 Block Block 5 3 parallel using PutBlock Retry failed blocks Commit the blob using PutBlockList Cloud blob
  • 23. Windows Azure Blobs Types Page Blobs Targeted at random write workloads Each blob consists of an array of pages Size limit 1TB per blob Page Each page range write is committed on PUT Page is 512 bytes in size Write boundary aligned at multiple of 512 byte Range reads possible Pages that do not have data are zeroed out
  • 24. Page blobs Write 5K bytes - PutPage 0 5120 Clear 2K bytes starting at offset 1K – ClearPage 0 1024 3072 5120 Overwrite 2K bytes starting at 2K – PutPage 2048 4096 5120 0 Truncate blob to 3K - SetMaxBlobSize 0 3072
  • 25. Blob StorageDEMO
  • 26. Sharing Your Files Every blob request must be signed with the account owner’s key Share your files The container must be public Shared Access Signature (SAS) – share pre- authenticated URLs with users SAS Use container level access as it allows access to be easily revoked
  • 27. Shared Access Signatures Services want to distribute access to blobs, but do not want to distribute their secret key Can create a Shared Access Signature (SAS) using the secret key Then give out the SAS to provide time based access to blobs Valid time range st=Start time (optional) se=End time Two resource levels of access to grant c=Container | b=Blob Four types of permissions (or any combination) r=Read | w=Write | d=Delete | l=List Signed Identifier (optional) Allows time range and permissions to be stored in the blob service for the SAS Provides instant revocation of SAS https://sally.blob.core.windows.net/images/pic1.jpg? st=2009-11-07T08:49Z &se=2009-11-07T09:49Z &sr=c &sp=rw &sig=3OSeIHP8haK%2fle9%2bBK3BX1DsdMM%3d &si=foo
  • 28. Shared Access SignaturesDEMO
  • 29. What Are Snapshots? Create a point in time read-only copy of ablob Every snapshot creates a new read onlypoint in time copy Charged only for unique blocks or pages i.e.reuse blocks or pages For reuse, use WritePages or PutBlock & PutBlockList Restore snapshots using copy blob Cleanup your snapshots
  • 30. What does unique mean?ID=1 A ID=1 AID=2 BB ID=2 BB Base blob = #1 snapshot=2011-04-alphabets.txt 10T19:26:24.8690267ZID=1 A ID=1 AID=2 BB ID=2 BBID=3 CCC ID=3 CCC Base blob = #2 snapshot=2011-05-alphabets.txt 10T19:26:24.8690267Z
  • 31. What does unique mean? ID=1 A ID=1 A ID=2 BB ID=2 BB ID=3 CCC #1 snapshot=2011-04- #2 snapshot=2011-05-10T19:26:24.8690267Z 10T19:26:24.8690267ZID=1 A ID=1 AID=2 BB ID=2 BB ID=3 CCC ID=3 CCC Base blob = #3 snapshot=2011-05- alphabets.txt 10T19:28:24.8690267Z •UploadFile/UploadText/UploadFromStream/UploadByteArra y overwrites all blocks •Charged for entire snapshot and base blob
  • 32. Windows AzureContent Delivery Network Scenario Frequently accessed blobs Accessed from around the world Desire Windows Azure Content Delivery Network (CDN) provides high-bandwidth global blob content delivery 18 locations globally (US, Europe, Asia, Australia and South America), and growing Blob service URL vs CDN URL: Windows Azure Blob URL: http://sally.blob.core.windows.net/ Windows Azure CDN URL: http://<guid>.vo.msecnd.net/ Custom Domain Name for CDN: http://events.cohowinery.com/
  • 33. Windows Azure Content DeliveryNetwork http://guid01.vo.msecnd.net/images/pic.1jpgTo Enable CDN: Register for CDN via Dev Portal Not Accessible! Set container images to public Edge Location Edge Location TTL Edge Location Content Delivery Network http://sally.blob.core.windows.net/  http://guid01.vo.msecnd.net/ pic1.jpg pic1.jpg pic1.jpg http://sally.blob.core.windows.net/images/pic1.jpg Windows Azure Blob Service
  • 34. Windows Azure Drive Provides a durable NTFS volume for Windows Azureapplications Use existing NTFS APIs Easy migration path to the cloud Durability and survival of data on application failover or hardware failure All flushed and un-buffered writes to drive are made durable A Windows Azure Drive is a Page Blob Mounts Page Blob as an NTFS drive Mounted by one VM at a time for read/write A VM can dynamically mount up to 16 drives Drives can be up to 1 TB
  • 35. WINDOWS AZURE TABLES
  • 36. Windows Azure Tables Provides Structured Storage Massively Scalable and Durable Tables Billions of entities (rows) and TBs of data A storage account can contain many tables No limit on number of entities (aka rows) in each table Provides flexible schema Familiar and Easy to use API WCF Data Services - .NET classes and LINQ REST (OData Protocol) – with any platform or language
  • 37. Windows Azure Tables Is not relational Can Not- Create foreign key relationships between tables. Perform server side joins between tables. Create custom indexes on the tables. No server side Count(), for example.
  • 38. Table Data Model Table A storage account can create many tables Table name is scoped by account Set of entities (i.e. rows) Entity Set of properties (columns) Required properties PartitionKey, RowKey and Timestamp 38
  • 39. Required Entity Properties PartitionKey & RowKey Uniquely identifies an entity Defines the sort order Use them to scale your application Timestamp Read only Optimistic Concurrency 39
  • 40. Required PropertiesAll entities must have the following properties: Timestamp PartitionKey RowKey PartitionKey + RowKey = “primary key”
  • 41. Table Details Not an RDBMS! More on table modeling in Storage Strategies session Table Create, Query, Delete Tables can have metadata Entities Insert Update Merge – Partial update Replace – Update entire entity Delete Query Entity Group Transactions Multiple CUD Operations in a single atomic transaction
  • 42. Entity Properties Entity can have up to 255 properties Up to 1MB per entity Mandatory Properties for every entity PartitionKey & RowKey (only indexed properties) Uniquely identifies an entity Defines the sort order Timestamp Optimistic Concurrency. Exposed as an HTTP ETag No fixed schema for other properties Each property is stored as a <name, typed value> pair No schema stored for a table Properties can be the standard .NET types String, binary, bool, DateTime, GUID, int, int64, and double
  • 43. No Fixed Schema First Last Birthdate Fav Sport Kim Akers 2/2/1981 Nancy Anderson 3/15/1965 Canoeing Mark Hassall May 1, 1976
  • 44. Querying ?$filter=Last eq ‘Akers’ First Last Birthdate Kim Akers 2/2/1981 Nancy Anderson 3/15/1965 Mark Hassall May 1, 1976
  • 45. Purpose of the PartitionKey Entity Locality Entities in the same partition will be stored together Efficient querying and cache locality Endeavour to include partition key in all queries Entity Group Transactions Atomic multiple Insert/Update/Delete in same partition in a single transaction Table Scalability Target throughput – 500 tps/partition, several thousand tps/account Windows Azure monitors the usage patterns of partitions Automatically load balance partitions Each partition can be served by a different storage node Scale to meet the traffic needs of your table
  • 46. Partitions and Partition Ranges PartitionKey PartitionKey RowKey RowKey Timestamp Timestamp ModelYear ModelYear (Category) (Category) (Title) (Title) Bikes Super Duper Cycle … 2009 Bikes Quick Cycle 200 Deluxe Quick Cycle 200 Deluxe … … 2007 2007 Bikes … … … … … … … … Canoes Whitewater … 2009 Canoes Whitewater … 2009 Canoes Flatwater … 2006 Canoes Flatwater … 2006 PartitionKey Rafts RowKey Tourer 14ft Super …Timestamp ModelYear 1999 (Category) (Title) … … 14ft Super Tourer …… … 1999 Rafts Skis … Fabrikam Back Trackers … …… 2009 … Skis Fabrikam Back Trackers … …… 2009 … … … … … … Tents Super Palace … 2008 Tents Super Palace … 2008
  • 47. Table StorageDEMO
  • 48. WINDOWS AZURE QUEUES
  • 49. Windows Azure Queues Queue are highly scalable, available and providereliable message delivery Simple, asynchronous work dispatch A storage account can create any number of queues 8K message size limit and default expiry of 7 days Programming semantics ensure that a message can be processed at least once Get message to make the message invisible Delete message to remove the message Access is provided via REST
  • 50. Account, Queues and Messages An account can create many queues Queue Name is scoped by the account A Queue contains messages No limit on number of messages stored in a queue Set a limit for message expiration Messages Message size <= 8 KB To store larger data, store data in blob/entity storage, and the blob/entity name in the message Message now has dequeue count 52
  • 51. Removing Poison MessagesProducers Consumers P2 C1 1. GetMessage(Q, 30 s)  msg 1 4 3 2 1 30 1 1 0 0 0 C2 P1 2. GetMessage(Q, 30 s)  msg 2 53
  • 52. Removing Poison MessagesProducers Consumers 1 C1 1. GetMessage(Q, 30 s)  msg 1 P2 1 5. C1 crashed 4 3 2 1 3 6. msg1 visible 30 s after Dequeue 0 0 1 2 1 2 1 C2 P1 2. GetMessage(Q, 30 s)  msg 2 3. C2 consumed msg 2 4. DeleteMessage(Q, msg 2) 7. GetMessage(Q, 30 s)  msg 1 54
  • 53. Removing Poison MessagesProducers Consumers 1. Dequeue(Q, 30 sec)  msg 1 P2 C1 5. C1 crashed 10. C1 restarted 11. Dequeue(Q, 30 sec)  msg 1 4 33 1 12. DequeueCount > 2 0 0 3 2 13. Delete (Q, msg1) 1 C2 P1 2 6. msg1 visible 30s after Dequeue2. Dequeue(Q, 30 sec)  msg 2 9. msg1 visible 30s after Dequeue3. C2 consumed msg 24. Delete(Q, msg 2)7. Dequeue(Q, 30 sec)  msg 18. C2 crashed 55
  • 54. Queue StorageDEMO
  • 55. SCALABILITY –BEST PRACTICES & TIPS
  • 56. Know The Scalability TargetsSingle Blob Partition • Throughput up to 60 MB/sSingle Queue/Table Partition • Up to 500 transactions (entities or messages) per secondStorage Account • SLA – 99.9% Availability • Capacity – Up to 100 TBs • Transactions – Up to 5000 entities per second • Bandwidth – Up to 3 gigabits per secondScale Above the limits • Partition between multiple storage accounts and partitions • When limit is hit, app may see ‘503 server busy’: applications should implement exponential back-off
  • 57. Automatic Load Balancing - Assignment VIPLegend - Partition - Server Load Distributed File System
  • 58. Partition Keys In Each Abstraction • Every blob and itsBlobs – Container name + Blob snapshots are in a singlename partitionContainer Name Blob Name Snapshotimage annarbor/bighouse.jpgimage annarbor/bighouse.jpg 2009-12-03T15:26:19.4466877Zimage annarbor/denard.jpgbackup annarbor/bighouse.jpg
  • 59. Partition Keys In Each Abstraction Entities – • Entities with same PartitionKey TableName + value are served from same PartitionKey partitionTable Name PartitionKey RowKey Zipcode CityCustomerOrder Alaska Tina Fey 99501 AnchorageCustomerOrder Alaska Sarah Palin 99501 AnchorageCustomerOrder Washington Bill Johnson 98053 RedmondCustomers Washington Bill Johnson 98053 Redmond
  • 60. Partition Keys In Each Abstraction Messages – • All messages for a single queue Queue Name belong to the same partition Queue Message jobs Message1 jobs Message2 workflow Message1
  • 61. Table Inserts: Single Partition (SP) vs. Multiple Partitions (MP) 12000 11000 10000 9000 8000Entities / Sec 7000 Successful 6000 Inserts (SP) 5000 4000 Successful 3000 Inserts (MP) 2000 1000 0 1 2 5 10 16 Extra Large VMs (15 Threads per VM)
  • 62. Table Get: Single Partition (SP) vs. Multiple Partitions (MP) 12000 11000 10000 9000 8000Entities / Sec 7000 Successful Gets 6000 (SP) 5000 4000 Successful Gets 3000 (MP) 2000 1000 0 1 2 5 10 16 Extra Large VMs (15 Threads per VM)
  • 63. Are Unique Partition Key Values Sufficient ToScale? Avoid “Append/Prepend Only” Patterns For High Scale Timestamp as Partition Key… looks like an obvious choice It is not a single partition as time moves forward Append/Prepend only Partition Key Other (Timestamp) properties Requests to single partition range … 2010-10-15 02:00:01 Load balancing does not help … 2010-10-15 02:00:11 Server may throttle 100000 more rows … 2010-10-17 11:59:58 … Server A 2010-10-17 11:59:58 … 80000 more rows …Applications Server B 2010-10-17 12:30:00 … Client 2010-10-17 12:30:01 … 2010-10-17 12:30:01 … 2010-10-17 12:30:02 … 2010-10-17 12:30:03 …
  • 64. 67