How to Remove Document Management Hurdles with X-Docs?
Building Web Apps with Azure Storage
1. Microsoft Cloud Day
K.Mohamed Faizal Lead Consultant, NCS Pte Ltd
Building web applications with Azure Storage
Faizal has over a decade of experience in Information
Technology with a focus on enabling portals, Internet &
Intranet application development.
In this session you will learn storage capabilities of Windows
Azure, Blobs, Tables and Queues. Discover how to create
storage accounts; upload and retrieve blobs and blob
metadata; create, update and query tables; and create a
simple service that uses a message queue for
communication.
4. Agenda
Windows Azure Storage
Blobs,
Tables,
Queues
Scalability – Best Practices & Tips
Q&A
5. In General
Web applications
Relational database ( SQL Server)
it’s very difficult to design scalable SQL Server
at low cost.
6. Common Consideration
Do you have enough space to store all the files you
need?
How do you add more storage capacity?
If a disk crashes, where does your data go?
Is the storage block load balanced?
What if you lose your connection to the block? Is it
redundant?
At what point do you max out your disk, in terms of
reading and writing?
How do you evenly distribute load across all disks?
7. Storage Options
SQL Server
Network share
Distributed File System (DFS)
Network-attached storage (NAS)
Direct-attached storage (DAS)
Storage area network (SAN)
8. Storage Options
SQL Server
High-availability technology (such as
clustering, mirroring, or replication), your database
server is likely to be a single point of failure in the
system
Network share
This cheapo solution offers no redundancy and
provides no ability to scale out
Distributed File System (DFS)
Using replication ensures that there are no single
points of failure in this solution and that the data is
held on multiple machines.
9. Storage Options
Network-attached storage (NAS)
NAS devices can range from being pretty cheap to
very expensive, depending on the levels of
scalability, performance, and redundancy that you
require from the device
Direct-attached storage (DAS)
Storage area network (SAN)
SAN devices support replication and are highly
scalable (they scale much higher than do DAS
devices), fault tolerant, high performing, and
incredibly expensive.
10. I Need Storage
Which can provide
Cost effective,
Scalable,
Durable and,
Highly Available
11. What is Windows Azure Storage?
Cloud Storage System Provides
Scalable, Durable and Highly Available Storage System
Abstractions:
Blobs – Provides a simple interface for storing named files along with
metadata for the file
Tables – Provides structured storage. A Table is a set of entities, which
contain a set of properties
Queues – Provides reliable storage and delivery of messages for an
application
Drives – Provides durable NTFS volumes for Windows Azure applications to
use
13. Windows Azure and SQL Azure
Azure Storage SQL Azure
Vision Highly scalable, highly Scalable, highly available
available store in the relational store in the
Cloud Cloud
Access Uses WCF Data Services - SqlClient + TSQL
REST
Relational? No Yes – but with some
limitations
Analogy File System RDBMS – as it is
Maximum Amount of 100TB 50GB (up to Oct 2010)
data in a single
“database”
Price per GB per month $ 0.15 $ 9.99
14. Windows Azure Storage Account
User creates a globally unique storage account
name
Can choose geo-location to host storage account
“US Anywhere”, “US North Central”, “US South Central”,
Can co-locate storage account with compute
account
Receive a 256 bit secret key when creating account
15. Storage Account Capacity
Storage Account Capacity at Commercial
Availability
Each storage account can store up to 100 TB
Default limit of 5 storage accounts per
subscription.
16. Windows Azure Data Storage Concepts
Container Blobs
https://<account>.blob.core.windows.net/<container>
Account Table Entities
https://<account>.table.core.windows.net/<table>
Queue Messages
https://<account>.queue.core.windows.net/<queue>
19. Current Storage Solutions
SQL Servers
Challenges with cost, performance and backup
Your database size may grow very big
File System Storage
Load balance? Cost?
21. Windows Azure Blobs Types
Block blobs
Targeted at streaming workloads
Each blob consists of a sequence of blocks
2 Phase commit: Blocks are uploaded and then
separately committed
Efficient continuation and retry
Send multiple out of order blocks in parallel and
decide the block order during commit
Random range reads possible
Size limit 200GB per blob
22. Block blobs
File has variable
Local file
sized blocks
Upload blocks in Block 1 Block 2
Block
Block 5
3
parallel using
PutBlock
Retry failed
blocks
Commit the blob
using PutBlockList
Cloud blob
23. Windows Azure Blobs Types
Page Blobs
Targeted at random write workloads
Each blob consists of an array of pages
Size limit 1TB per blob
Page
Each page range write is committed on PUT
Page is 512 bytes in size
Write boundary aligned at multiple of 512 byte
Range reads possible
Pages that do not have data are zeroed out
26. Sharing Your Files
Every blob request must be signed with the
account owner’s key
Share your files
The container must be public
Shared Access Signature (SAS) – share pre-
authenticated URLs with users
SAS
Use container level access as it allows access to be
easily revoked
27. Shared Access Signatures
Services want to distribute access to blobs, but do not want to
distribute their secret key
Can create a Shared Access Signature (SAS) using the secret key
Then give out the SAS to provide time based access to blobs
Valid time range
st=Start time (optional)
se=End time
Two resource levels of access to grant
c=Container | b=Blob
Four types of permissions (or any combination)
r=Read | w=Write | d=Delete | l=List
Signed Identifier (optional)
Allows time range and permissions to be stored in the blob service for the
SAS
Provides instant revocation of SAS
https://sally.blob.core.windows.net/images/pic1.jpg?
st=2009-11-07T08:49Z &se=2009-11-07T09:49Z &sr=c &sp=rw
&sig=3OSeIHP8haK%2fle9%2bBK3BX1DsdMM%3d &si=foo
29. What Are Snapshots?
Create a point in time read-only copy of a
blob
Every snapshot creates a new read only
point in time copy
Charged only for unique blocks or pages i.e.
reuse blocks or pages
For reuse, use WritePages or PutBlock & PutBlockList
Restore snapshots using copy blob
Cleanup your snapshots
30. What does unique mean?
ID=1 A ID=1 A
ID=2 BB ID=2 BB
Base blob = #1 snapshot=2011-04-
alphabets.txt 10T19:26:24.8690267Z
ID=1 A ID=1 A
ID=2 BB ID=2 BB
ID=3 CCC ID=3 CCC
Base blob = #2 snapshot=2011-05-
alphabets.txt 10T19:26:24.8690267Z
31. What does unique mean?
ID=1 A ID=1 A
ID=2 BB ID=2 BB
ID=3 CCC
#1 snapshot=2011-04- #2 snapshot=2011-05-
10T19:26:24.8690267Z 10T19:26:24.8690267Z
ID=1 A ID=1 A
ID=2 BB ID=2 BB
ID=3 CCC ID=3 CCC
Base blob = #3 snapshot=2011-05-
alphabets.txt 10T19:28:24.8690267Z
•UploadFile/UploadText/UploadFromStream/UploadByteArra
y overwrites all blocks
•Charged for entire snapshot and base blob
32. Windows Azure
Content Delivery Network
Scenario
Frequently accessed blobs
Accessed from around the world
Desire
Windows Azure Content Delivery Network (CDN)
provides high-bandwidth global blob content delivery
18 locations globally (US, Europe, Asia, Australia and South
America), and growing
Blob service URL vs CDN URL:
Windows Azure Blob URL: http://sally.blob.core.windows.net/
Windows Azure CDN URL: http://<guid>.vo.msecnd.net/
Custom Domain Name for CDN: http://events.cohowinery.com/
33. Windows Azure Content Delivery
Network
http://guid01.vo.msecnd.net/images/pic.1jpg
To Enable CDN:
Register for CDN via Dev Portal
Not Accessible!
Set container images to public
Edge Location Edge Location
TTL
Edge Location
Content Delivery Network
http://sally.blob.core.windows.net/
http://guid01.vo.msecnd.net/
pic1.jpg
pic1.jpg
pic1.jpg
http://sally.blob.core.windows.net/images/pic1.jpg
Windows Azure Blob Service
34. Windows Azure Drive
Provides a durable NTFS volume for Windows Azure
applications
Use existing NTFS APIs
Easy migration path to the cloud
Durability and survival of data on application failover or
hardware failure
All flushed and un-buffered writes to drive are made durable
A Windows Azure Drive is a Page Blob
Mounts Page Blob as an NTFS drive
Mounted by one VM at a time for read/write
A VM can dynamically mount up to 16 drives
Drives can be up to 1 TB
36. Windows Azure Tables
Provides Structured Storage
Massively Scalable and Durable Tables
Billions of entities (rows) and TBs of data
A storage account can contain many tables
No limit on number of entities (aka rows) in each table
Provides flexible schema
Familiar and Easy to use API
WCF Data Services - .NET classes and LINQ
REST (OData Protocol) – with any platform or language
37. Windows Azure Tables
Is not relational
Can Not-
Create foreign key relationships between tables.
Perform server side joins between tables.
Create custom indexes on the tables.
No server side Count(), for example.
38. Table Data Model
Table
A storage account can create many tables
Table name is scoped by account
Set of entities (i.e. rows)
Entity
Set of properties (columns)
Required properties
PartitionKey, RowKey and Timestamp
38
39. Required Entity Properties
PartitionKey & RowKey
Uniquely identifies an entity
Defines the sort order
Use them to scale your application
Timestamp
Read only
Optimistic Concurrency
39
40. Required Properties
All entities must have the following properties:
Timestamp
PartitionKey
RowKey
PartitionKey + RowKey = “primary key”
41. Table Details
Not an RDBMS!
More on table modeling in Storage Strategies session
Table
Create, Query, Delete
Tables can have metadata
Entities
Insert
Update
Merge – Partial update
Replace – Update entire entity
Delete
Query
Entity Group Transactions
Multiple CUD Operations in a single atomic transaction
42. Entity Properties
Entity can have up to 255 properties
Up to 1MB per entity
Mandatory Properties for every entity
PartitionKey & RowKey (only indexed properties)
Uniquely identifies an entity
Defines the sort order
Timestamp
Optimistic Concurrency. Exposed as an HTTP ETag
No fixed schema for other properties
Each property is stored as a <name, typed value> pair
No schema stored for a table
Properties can be the standard .NET types
String, binary, bool, DateTime, GUID, int, int64, and double
43. No Fixed Schema
First Last Birthdate Fav Sport
Kim Akers 2/2/1981
Nancy Anderson 3/15/1965 Canoeing
Mark Hassall May 1, 1976
44. Querying
?$filter=Last eq ‘Akers’
First Last Birthdate
Kim Akers 2/2/1981
Nancy Anderson 3/15/1965
Mark Hassall May 1, 1976
45. Purpose of the PartitionKey
Entity Locality
Entities in the same partition will be stored together
Efficient querying and cache locality
Endeavour to include partition key in all queries
Entity Group Transactions
Atomic multiple Insert/Update/Delete in same partition in a single
transaction
Table Scalability
Target throughput – 500 tps/partition, several thousand tps/account
Windows Azure monitors the usage patterns of partitions
Automatically load balance partitions
Each partition can be served by a different storage node
Scale to meet the traffic needs of your table
51. Windows Azure Queues
Queue are highly scalable, available and provide
reliable message delivery
Simple, asynchronous work dispatch
A storage account can create any number of queues
8K message size limit and default expiry of 7 days
Programming semantics ensure that a message can be
processed at least once
Get message to make the message invisible
Delete message to remove the message
Access is provided via REST
52. Account, Queues and Messages
An account can create many queues
Queue Name is scoped by the account
A Queue contains messages
No limit on number of messages stored in a queue
Set a limit for message expiration
Messages
Message size <= 8 KB
To store larger data, store data in blob/entity storage, and
the blob/entity name in the message
Message now has dequeue count
52
58. Know The Scalability Targets
Single Blob Partition
• Throughput up to 60 MB/s
Single Queue/Table Partition
• Up to 500 transactions (entities or messages) per second
Storage Account
• SLA – 99.9% Availability
• Capacity – Up to 100 TBs
• Transactions – Up to 5000 entities per second
• Bandwidth – Up to 3 gigabits per second
Scale Above the limits
• Partition between multiple storage accounts and partitions
• When limit is hit, app may see ‘503 server busy’: applications
should implement exponential back-off
59. Automatic Load Balancing -
Assignment
VIP
Legend
- Partition
- Server Load
Distributed File System
60. Partition Keys In Each Abstraction
• Every blob and its
Blobs – Container name + Blob
snapshots are in a single
name
partition
Container Name Blob Name Snapshot
image annarbor/bighouse.jpg
image annarbor/bighouse.jpg 2009-12-03T15:26:19.4466877Z
image annarbor/denard.jpg
backup annarbor/bighouse.jpg
61. Partition Keys In Each Abstraction
Entities – • Entities with same PartitionKey
TableName + value are served from same
PartitionKey partition
Table Name PartitionKey RowKey Zipcode City
CustomerOrder Alaska Tina Fey 99501 Anchorage
CustomerOrder Alaska Sarah Palin 99501 Anchorage
CustomerOrder Washington Bill Johnson 98053 Redmond
Customers Washington Bill Johnson 98053 Redmond
62. Partition Keys In Each Abstraction
Messages – • All messages for a single queue
Queue Name belong to the same partition
Queue Message
jobs Message1
jobs Message2
workflow Message1
63. Table Inserts:
Single Partition (SP) vs. Multiple Partitions (MP)
12000
11000
10000
9000
8000
Entities / Sec
7000
Successful
6000 Inserts (SP)
5000
4000
Successful
3000
Inserts (MP)
2000
1000
0
1 2 5 10 16
Extra Large VMs (15 Threads per VM)
64. Table Get:
Single Partition (SP) vs. Multiple Partitions (MP)
12000
11000
10000
9000
8000
Entities / Sec
7000
Successful Gets
6000 (SP)
5000
4000
Successful Gets
3000
(MP)
2000
1000
0
1 2 5 10 16
Extra Large VMs (15 Threads per VM)
65. Are Unique Partition Key Values Sufficient To
Scale?
Avoid “Append/Prepend Only” Patterns For High Scale
Timestamp as Partition Key… looks like an obvious choice
It is not a single partition as time moves forward
Append/Prepend only Partition Key Other
(Timestamp) properties
Requests to single partition range …
2010-10-15 02:00:01
Load balancing does not help …
2010-10-15 02:00:11
Server may throttle 100000 more rows …
2010-10-17 11:59:58 …
Server A
2010-10-17 11:59:58 …
80000 more rows …
Applications Server B 2010-10-17 12:30:00 …
Client 2010-10-17 12:30:01 …
2010-10-17 12:30:01 …
2010-10-17 12:30:02 …
2010-10-17 12:30:03 …
Slide ObjectivesUnderstand TablesSpeaker NotesWithin a storage account, a developer may create named tables. Tables store data as entities. An entity is a collection of named properties and their values, similar to a row. Tables are partitioned to support load balancing across storage nodes. Each table has as its first property a partition key that specifies the partition an entity belongs to. The second property is a row key that identifies an entity within a given partition. The combination of the partition key and the row key forms a primary key that identifies each entity uniquely within the table.The Table service does not enforce any schema. A developer may choose to implement and enforce a schema on the client sideNoteshttp://msdn.microsoft.com/en-us/library/dd573356.aspx
Slide ObjectivesUnderstand Tables and EntitiesSpeaker NotesTables store data as entities. An entity is a collection of named properties and their values, similar to a row- not an RDBMS thoughTables are partitioned to support load balancing across storage nodes. Each table has as its first property a partition key that specifies the partition an entity belongs to. The second property is a row key that identifies an entity within a given partition. The combination of the partition key and the row key forms a primary key that identifies each entity uniquely within the table.The Table service does not enforce any schema. A developer may choose to implement and enforce a schema on the client sideNoteshttp://msdn.microsoft.com/en-us/library/dd573356.aspxhttp://msdn.microsoft.com/en-us/library/dd179338.aspx
Slide ObjectivesUnderstand Flexible EntitiesSpeaker NotesTables store data as entities. A table can contain entities of any shapeThere is no fixed schemaThere is no schema checkingThere is no strong typing- not that Birthdate is stored as both a datetime value and as a stringNot that we can add additional columnsNoteshttp://msdn.microsoft.com/en-us/library/dd573356.aspx
Slide ObjectivesUnderstand The Basic Query SyntaxSpeaker NotesTables store data as entities. Querying is per the ADO.NET Data Services spechttp://msdn.microsoft.com/en-us/library/cc668784.aspxShould endeavour to always include the Partition key to limit scope of query- partitions always served by a single storage nodeNoteshttp://msdn.microsoft.com/en-us/library/dd573356.aspx
Slide ObjectivesUnderstand The Partition KeySpeaker NotesTables are partitioned to support load balancing across storage nodes. A table's entities are organized by partition. A partition is a consecutive range of entities possessing the same partition key value. The partition key is a unique identifier for the partition within a given table, specified by the PartitionKey property. The partition key forms the first part of an entity's unique identifier within the table.The partition key may be a string value up to 1 KB in size.You must include the PartitionKey property in every insert, update, and delete operation.Noteshttp://msdn.microsoft.com/en-us/library/dd573356.aspxhttp://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/07/understanding-the-scalability-availability-durability-and-billing-of-windows-azure-storage.aspx http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx
Slide ObjectivesUnderstand The Partition KeySpeaker NotesTables are partitioned to support load balancing across storage nodes. A table's entities are organized by partition. A partition is a consecutive range of entities possessing the same partition key value. The partition key is a unique identifier for the partition within a given table, specified by the PartitionKey property. The partition key forms the first part of an entity's unique identifier within the table.The partition key may be a string value up to 1 KB in size.You must include the PartitionKey property in every insert, update, and delete operation.Noteshttp://msdn.microsoft.com/en-us/library/dd573356.aspxhttp://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/07/understanding-the-scalability-availability-durability-and-billing-of-windows-azure-storage.aspx http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx