Microsoft Cloud DayK.Mohamed Faizal Lead Consultant, NCS Pte Ltd Building web applications with Azure Storage Faizal has over a decade of experience in Information Technology with a focus on enabling portals, Internet & Intranet application development. In this session you will learn storage capabilities of Windows Azure, Blobs, Tables and Queues. Discover how to create storage accounts; upload and retrieve blobs and blob metadata; create, update and query tables; and create a simple service that uses a message queue for communication.
CL201: Building WebApplication with Azure Storage K.Mohamed Faizal, Lead Consultant @ NCS (P) Ltd. 27th April 2011
Agenda Windows Azure Storage Blobs, Tables, Queues Scalability – Best Practices & Tips Q&A
In General Web applications Relational database ( SQL Server) it’s very difficult to design scalable SQL Server at low cost.
Common Consideration Do you have enough space to store all the files you need? How do you add more storage capacity? If a disk crashes, where does your data go? Is the storage block load balanced? What if you lose your connection to the block? Is it redundant? At what point do you max out your disk, in terms of reading and writing? How do you evenly distribute load across all disks?
Storage Options SQL Server Network share Distributed File System (DFS) Network-attached storage (NAS) Direct-attached storage (DAS) Storage area network (SAN)
Storage Options SQL Server High-availability technology (such as clustering, mirroring, or replication), your database server is likely to be a single point of failure in the system Network share This cheapo solution offers no redundancy and provides no ability to scale out Distributed File System (DFS) Using replication ensures that there are no single points of failure in this solution and that the data is held on multiple machines.
Storage Options Network-attached storage (NAS) NAS devices can range from being pretty cheap to very expensive, depending on the levels of scalability, performance, and redundancy that you require from the device Direct-attached storage (DAS) Storage area network (SAN) SAN devices support replication and are highly scalable (they scale much higher than do DAS devices), fault tolerant, high performing, and incredibly expensive.
I Need Storage Which can provide Cost effective, Scalable, Durable and, Highly Available
What is Windows Azure Storage? Cloud Storage System Provides Scalable, Durable and Highly Available Storage System Abstractions: Blobs – Provides a simple interface for storing named files along with metadata for the file Tables – Provides structured storage. A Table is a set of entities, which contain a set of properties Queues – Provides reliable storage and delivery of messages for an application Drives – Provides durable NTFS volumes for Windows Azure applications to use
Triplicate… Your data is replicated 3 or 4 times in their data centre
Windows Azure and SQL Azure Azure Storage SQL AzureVision Highly scalable, highly Scalable, highly available available store in the relational store in the Cloud CloudAccess Uses WCF Data Services - SqlClient + TSQL RESTRelational? No Yes – but with some limitationsAnalogy File System RDBMS – as it is Maximum Amount of 100TB 50GB (up to Oct 2010)data in a single“database”Price per GB per month $ 0.15 $ 9.99
Windows Azure Storage AccountUser creates a globally unique storage accountname Can choose geo-location to host storage account “US Anywhere”, “US North Central”, “US South Central”, Can co-locate storage account with compute account Receive a 256 bit secret key when creating account
Storage Account Capacity Storage Account Capacity at Commercial Availability Each storage account can store up to 100 TB Default limit of 5 storage accounts per subscription.
Windows Azure Data Storage Concepts Container Blobs https://<account>.blob.core.windows.net/<container> Account Table Entities https://<account>.table.core.windows.net/<table> Queue Messages https://<account>.queue.core.windows.net/<queue>
Windows Azure Blobs Types Block blobs Targeted at streaming workloads Each blob consists of a sequence of blocks 2 Phase commit: Blocks are uploaded and then separately committed Efficient continuation and retry Send multiple out of order blocks in parallel and decide the block order during commit Random range reads possible Size limit 200GB per blob
Block blobs File has variable Local file sized blocks Upload blocks in Block 1 Block 2 Block Block 5 3 parallel using PutBlock Retry failed blocks Commit the blob using PutBlockList Cloud blob
Windows Azure Blobs Types Page Blobs Targeted at random write workloads Each blob consists of an array of pages Size limit 1TB per blob Page Each page range write is committed on PUT Page is 512 bytes in size Write boundary aligned at multiple of 512 byte Range reads possible Pages that do not have data are zeroed out
Sharing Your Files Every blob request must be signed with the account owner’s key Share your files The container must be public Shared Access Signature (SAS) – share pre- authenticated URLs with users SAS Use container level access as it allows access to be easily revoked
Shared Access Signatures Services want to distribute access to blobs, but do not want to distribute their secret key Can create a Shared Access Signature (SAS) using the secret key Then give out the SAS to provide time based access to blobs Valid time range st=Start time (optional) se=End time Two resource levels of access to grant c=Container | b=Blob Four types of permissions (or any combination) r=Read | w=Write | d=Delete | l=List Signed Identifier (optional) Allows time range and permissions to be stored in the blob service for the SAS Provides instant revocation of SAS https://sally.blob.core.windows.net/images/pic1.jpg? st=2009-11-07T08:49Z &se=2009-11-07T09:49Z &sr=c &sp=rw &sig=3OSeIHP8haK%2fle9%2bBK3BX1DsdMM%3d &si=foo
What Are Snapshots? Create a point in time read-only copy of ablob Every snapshot creates a new read onlypoint in time copy Charged only for unique blocks or pages i.e.reuse blocks or pages For reuse, use WritePages or PutBlock & PutBlockList Restore snapshots using copy blob Cleanup your snapshots
What does unique mean?ID=1 A ID=1 AID=2 BB ID=2 BB Base blob = #1 snapshot=2011-04-alphabets.txt 10T19:26:24.8690267ZID=1 A ID=1 AID=2 BB ID=2 BBID=3 CCC ID=3 CCC Base blob = #2 snapshot=2011-05-alphabets.txt 10T19:26:24.8690267Z
What does unique mean? ID=1 A ID=1 A ID=2 BB ID=2 BB ID=3 CCC #1 snapshot=2011-04- #2 snapshot=2011-05-10T19:26:24.8690267Z 10T19:26:24.8690267ZID=1 A ID=1 AID=2 BB ID=2 BB ID=3 CCC ID=3 CCC Base blob = #3 snapshot=2011-05- alphabets.txt 10T19:28:24.8690267Z •UploadFile/UploadText/UploadFromStream/UploadByteArra y overwrites all blocks •Charged for entire snapshot and base blob
Windows AzureContent Delivery Network Scenario Frequently accessed blobs Accessed from around the world Desire Windows Azure Content Delivery Network (CDN) provides high-bandwidth global blob content delivery 18 locations globally (US, Europe, Asia, Australia and South America), and growing Blob service URL vs CDN URL: Windows Azure Blob URL: http://sally.blob.core.windows.net/ Windows Azure CDN URL: http://<guid>.vo.msecnd.net/ Custom Domain Name for CDN: http://events.cohowinery.com/
Windows Azure Content DeliveryNetwork http://guid01.vo.msecnd.net/images/pic.1jpgTo Enable CDN: Register for CDN via Dev Portal Not Accessible! Set container images to public Edge Location Edge Location TTL Edge Location Content Delivery Network http://sally.blob.core.windows.net/ http://guid01.vo.msecnd.net/ pic1.jpg pic1.jpg pic1.jpg http://sally.blob.core.windows.net/images/pic1.jpg Windows Azure Blob Service
Windows Azure Drive Provides a durable NTFS volume for Windows Azureapplications Use existing NTFS APIs Easy migration path to the cloud Durability and survival of data on application failover or hardware failure All flushed and un-buffered writes to drive are made durable A Windows Azure Drive is a Page Blob Mounts Page Blob as an NTFS drive Mounted by one VM at a time for read/write A VM can dynamically mount up to 16 drives Drives can be up to 1 TB
Windows Azure Tables Provides Structured Storage Massively Scalable and Durable Tables Billions of entities (rows) and TBs of data A storage account can contain many tables No limit on number of entities (aka rows) in each table Provides flexible schema Familiar and Easy to use API WCF Data Services - .NET classes and LINQ REST (OData Protocol) – with any platform or language
Windows Azure Tables Is not relational Can Not- Create foreign key relationships between tables. Perform server side joins between tables. Create custom indexes on the tables. No server side Count(), for example.
Table Data Model Table A storage account can create many tables Table name is scoped by account Set of entities (i.e. rows) Entity Set of properties (columns) Required properties PartitionKey, RowKey and Timestamp 38
Required Entity Properties PartitionKey & RowKey Uniquely identifies an entity Defines the sort order Use them to scale your application Timestamp Read only Optimistic Concurrency 39
Required PropertiesAll entities must have the following properties: Timestamp PartitionKey RowKey PartitionKey + RowKey = “primary key”
Table Details Not an RDBMS! More on table modeling in Storage Strategies session Table Create, Query, Delete Tables can have metadata Entities Insert Update Merge – Partial update Replace – Update entire entity Delete Query Entity Group Transactions Multiple CUD Operations in a single atomic transaction
Entity Properties Entity can have up to 255 properties Up to 1MB per entity Mandatory Properties for every entity PartitionKey & RowKey (only indexed properties) Uniquely identifies an entity Defines the sort order Timestamp Optimistic Concurrency. Exposed as an HTTP ETag No fixed schema for other properties Each property is stored as a <name, typed value> pair No schema stored for a table Properties can be the standard .NET types String, binary, bool, DateTime, GUID, int, int64, and double
No Fixed Schema First Last Birthdate Fav Sport Kim Akers 2/2/1981 Nancy Anderson 3/15/1965 Canoeing Mark Hassall May 1, 1976
Querying ?$filter=Last eq ‘Akers’ First Last Birthdate Kim Akers 2/2/1981 Nancy Anderson 3/15/1965 Mark Hassall May 1, 1976
Purpose of the PartitionKey Entity Locality Entities in the same partition will be stored together Efficient querying and cache locality Endeavour to include partition key in all queries Entity Group Transactions Atomic multiple Insert/Update/Delete in same partition in a single transaction Table Scalability Target throughput – 500 tps/partition, several thousand tps/account Windows Azure monitors the usage patterns of partitions Automatically load balance partitions Each partition can be served by a different storage node Scale to meet the traffic needs of your table
Windows Azure Queues Queue are highly scalable, available and providereliable message delivery Simple, asynchronous work dispatch A storage account can create any number of queues 8K message size limit and default expiry of 7 days Programming semantics ensure that a message can be processed at least once Get message to make the message invisible Delete message to remove the message Access is provided via REST
Account, Queues and Messages An account can create many queues Queue Name is scoped by the account A Queue contains messages No limit on number of messages stored in a queue Set a limit for message expiration Messages Message size <= 8 KB To store larger data, store data in blob/entity storage, and the blob/entity name in the message Message now has dequeue count 52
Know The Scalability TargetsSingle Blob Partition • Throughput up to 60 MB/sSingle Queue/Table Partition • Up to 500 transactions (entities or messages) per secondStorage Account • SLA – 99.9% Availability • Capacity – Up to 100 TBs • Transactions – Up to 5000 entities per second • Bandwidth – Up to 3 gigabits per secondScale Above the limits • Partition between multiple storage accounts and partitions • When limit is hit, app may see ‘503 server busy’: applications should implement exponential back-off
Automatic Load Balancing - Assignment VIPLegend - Partition - Server Load Distributed File System
Partition Keys In Each Abstraction • Every blob and itsBlobs – Container name + Blob snapshots are in a singlename partitionContainer Name Blob Name Snapshotimage annarbor/bighouse.jpgimage annarbor/bighouse.jpg 2009-12-03T15:26:19.4466877Zimage annarbor/denard.jpgbackup annarbor/bighouse.jpg
Partition Keys In Each Abstraction Entities – • Entities with same PartitionKey TableName + value are served from same PartitionKey partitionTable Name PartitionKey RowKey Zipcode CityCustomerOrder Alaska Tina Fey 99501 AnchorageCustomerOrder Alaska Sarah Palin 99501 AnchorageCustomerOrder Washington Bill Johnson 98053 RedmondCustomers Washington Bill Johnson 98053 Redmond
Partition Keys In Each Abstraction Messages – • All messages for a single queue Queue Name belong to the same partition Queue Message jobs Message1 jobs Message2 workflow Message1
Table Inserts: Single Partition (SP) vs. Multiple Partitions (MP) 12000 11000 10000 9000 8000Entities / Sec 7000 Successful 6000 Inserts (SP) 5000 4000 Successful 3000 Inserts (MP) 2000 1000 0 1 2 5 10 16 Extra Large VMs (15 Threads per VM)
Table Get: Single Partition (SP) vs. Multiple Partitions (MP) 12000 11000 10000 9000 8000Entities / Sec 7000 Successful Gets 6000 (SP) 5000 4000 Successful Gets 3000 (MP) 2000 1000 0 1 2 5 10 16 Extra Large VMs (15 Threads per VM)
Are Unique Partition Key Values Sufficient ToScale? Avoid “Append/Prepend Only” Patterns For High Scale Timestamp as Partition Key… looks like an obvious choice It is not a single partition as time moves forward Append/Prepend only Partition Key Other (Timestamp) properties Requests to single partition range … 2010-10-15 02:00:01 Load balancing does not help … 2010-10-15 02:00:11 Server may throttle 100000 more rows … 2010-10-17 11:59:58 … Server A 2010-10-17 11:59:58 … 80000 more rows …Applications Server B 2010-10-17 12:30:00 … Client 2010-10-17 12:30:01 … 2010-10-17 12:30:01 … 2010-10-17 12:30:02 … 2010-10-17 12:30:03 …