Windows Azure storage services<br />Duy Lam<br />
Cloud computing<br />2<br />
Windows Azure platform<br />Agroup of cloud technologies, each providing a specific set of services to application develop...
Windows Azure<br />“Compute” to run Windows applications<br />“Storage” to store the data in the cloud<br />4<br />
Compute<br />5<br />
Storage<br />6<br /><http|https>://<account-name>.<service><resource-path><br />blob<br />queue<br />tab...
This “tables” is not relational tables. It’s structured storage and flexible schema: form of Tables, which contain a set o...
How to debug<br />Observe underline REST request with Fiddler: DevelopmentStorageProxyUri=http://ipv4.fiddler<br />8<br />
Fun with properties<br />Timestamp property:<br />Read only<br />Optimistic Concurrency<br />PartitionKey & RowKey propert...
10<br />Partitioning<br />
11<br />Partitioning<br />
Other features<br />Entity group transactions (EGT)<br />Atomically manipulate entities in same partition in a single tran...
x-ms-continuation-NextRowKey</li></ul>12<br />
Remember<br />Support query operators: From, Where, Take, First, FirstOrDefault<br />Cross-Table Consistency<br />13<br />
14<br />Queues Service<br />
Benefits<br />15<br />
How it works<br />16<br />VisibilityTimeout<br />
Notes<br />Queue<br />No limit on the number of messages<br />A message is stored for at most a week<br />Can have metadat...
18<br />
19<br />Blob Service<br />
Models<br />20<br />
Notes<br />Use directory-like hierarchy for the blob names and then list all the "directories“ by using CloudBlobContainer...
22<br />
Q & A<br />Thank you<br />
Upcoming SlideShare
Loading in …5

Windows Azure Storage Services


Published on

This presentation shows overview about storage services (tables, queues, blob) in Windows Azure platform. The demo project can be found here:!119

Published in: Technology
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The Cloud Infrastructure is similar to shared hosting and virtual hosting whose power (cpu speed, memory) can be configured anytime
  • The Windows Azure platform can be used both by:Applications running in the cloud On-premises applications (application run in local computer, a.k.aWinForm)Windows Azure is simple to understand: It’s a platform for running Windows applications and storing their data in the cloud. SQL Azure Database provides a cloud-based database management system (DBMS). This technology lets on-premises and cloud applications store relational and other types of data on Microsoft servers in Microsoft data centers. Windows Azure platform AppFabric to provide cloud-based infrastructure services beside running applications and storing data on cloud
  • Windows Azure is simple to understand: It’s a platform for running Windows applications and storing their data in the cloud.Windows Azure runs on a large number of machines, all located in Microsoft data centers and accessible via the InternetApplications (both of cloud and on-premises) can access the Windows Azure storage service by using a RESTful approach
  • Instance -&gt; VM -&gt; role (group of VM)Communication methods between web and worker roleIncrease the power of VM1. On Windows Azure, an application typically has multiple instances (VM), each running a copy of all or part of the application’s code. Each of these instances runs in its own Windows virtual machine (VM). These VMs are provided by a hypervisor that’s specifically designed for use in the cloud. Yet a developer doesn’t explicitly create these VMs. Instead, a developer creates applications using Webroles and/or Worker roles, then tells Windows Azure how many instances of each role to run. Windows Azure silently creates a VM for each instance, then runs the application in those VMs. A Web role can be implemented using ASP.NET, WCF, or another technology that works with IIS.A Worker role instance is quite similar to a Web role instance. The key difference is that a Worker role doesn’t have IIS preconfigured to run in each instance, and so unlike Web roles, Worker role instances aren’t hosted in IIS. A Worker role can still accept requests from the outside world, however, and developers can even run another Web server, such as Apache, in a Worker role instance2. Worker role instances can communicate with Web role instances in various ways. One option is to use Windows Azure storage queues. A Web role instance can insert a work item in a queue, and a Worker role instance can remove and process this item. Another option is for Worker roles and Web roles to set up direct connections via Windows Communication Foundation (WCF) or another technology. 3. For both Web roles and Worker roles, Windows Azure lets developers choose from four VM sizes: one core, two cores, four cores, and eight cores. Since each VM is assigned one or more cores, applicationscan have predictable performance. And to increase performance, an application’s owner can increase the number of running instances specified in the application’s configuration file. The Windows Azure fabric will then spin up new VMs, assign them to cores, and start running more instances of this application
  • Blob : just right for some kinds of data, but they’re too unstructured for many situations. Table: these aren’t relational tables. In fact, even though they’re called “tables”Queue: not intended to store data, to provide a way for Web role instances to communicate with Worker role instances More about storage1. The simplest way to store data in Windows Azure storage is to use blobs. There’s a simple hierarchy: there are one or more containers for an account, each of which holds one or more blobs. Blobs can be in any size —potentially as large as a terabyte each— and to make transferring large blobs more efficient, each one can be subdivided into blocks. Blobs can also have associated metadata, such as information about where a JPEG photograph was taken or who the composer is for an MP3 file. And to make distributed access to blob data more efficient, Windows Azure provides a content delivery network (CDN), storing frequently accessed data at locations closer to the applications that use it. Another way to use blobs is through Windows Azure XDrives, which can be mounted by a Web role instance or Worker role instance. The underlying storage for an XDrive is a blob, and so once a drive is mounted, the instance can read and write file system data that gets stored persistently in a blob. 2. Don’t be misled by the name: These aren’t relational tables. In fact, even though they’re called “tables”, the data they contain is actually stored in a set of entities with properties. A table has no defined schema; instead, properties can have various types, such as int, string, Bool, or DateTime. An application can access a table’s data using ADO.NET Data Services or LINQ. A single table can be quite large, with billions of entities holding terabytes of data, and Windows Azure storage can partition it across many servers if necessary to improve performance. 3. A primary use of queues is to provide a way for Web role instances to communicate with Worker role instances. For example, a user might submit a request to perform some compute-intensive task via a Web page implemented by a Windows Azure Web role. The Web role instance that receives this request can write a message into a queue describing the work to be done. A Worker role instance that’s waiting on this queue can then read the message and carry out the task it specifies. Any results can be returned via another queue or handled in some other way 4. Regardless of how it’s stored—in blobs, tables, or queues—all data held in Windows Azure storage is replicated three times. This replication allows fault tolerance, since losing a copy isn’t fatal. The system guarantees consistency, however, so an application that reads data it has just written will get what it expects. Windows Azure storage can be accessed either by a Windows Azure application or by an application running somewhere else. In both cases, all three Windows Azure storage styles use the conventions of REST to identify and expose data. Everything is named using URIs and accessed with standard HTTP operations. A .NET client can rely on ADO.NET Data Services and LINQ, but access to Windows Azure storage from, say, a Java application can just use standard REST. The Windows Azure platform charges independently for compute and storage resources. This means that an on-premises application could use just Windows Azure storage, accessing its data in the RESTful way just described. And because that data can be accessed directly from non-Windows Azure applications, it remains available even if the Windows Azure application that uses it isn’t running.
  • Windows Azure Table keeps track of the name and typed value for each property in each entity. An application may simulate a fixed schema on the client side by ensuring that all the entities it creates have the same set of properties.Table names may contain only alphanumeric characters.A table name may not begin with a numeric character. Table names are case-insensitive.Table names must be from 3 through 63 characters long.Property NameOnly alphanumeric characters and &apos;_&apos; are allowed. An entity can have at most 255 properties including the mandatory system properties – PartitionKey, RowKey and Timestamp. All other properties in an entity have a name defined by the application.PartitionKey and RowKey are of string type, and each key is limited to 1KB in size.Timestamp is a read-only system maintained property which should be treated as an opaque propertyNo Fixed Schema – No schema is stored by Windows Azure Table, so all of the properties are stored as &lt;name, typed value&gt; pairs. This means that two entities in the same table can have very different properties. A table can even have two entities with the same property name, but different types for the property value . However, property names must be unique within a single entity.Combined size of all data in an entity cannot exceed 1MB. This size includes the size of the property names as well as the size of the property values or their types, which includes the two mandatory key properties (PartitionKey and RowKey).Supported property types are: Binary, Bool, DateTime, Double, GUID, Int, Int64, String. PracticeVisual Studio Project Template: introduceCloud Template: Cloud solution with web role and worker roleAssociate an existing web application project to web roleRole settings  add Connection String to use local development storage“Hello world” Window Azure Table: open demo project, introduce CRUD operators
  • Timestamp property: Read onlyOptimistic Concurrency: used by server to avoid concurrency updating. It contains the version of entity when it is retrieved from server PartitionKey and RowKey property: PartitionKey and RowKey properties work together to provide for uniqueness, acting as a primary key for the row, for a single entity in a table. Each entity in a table must have a unique PartitionKey/RowKey combinationPartitionKeyUsed for physically partitioning (scale) the tablesFor example, consider a table that contains information about food and has PartitionKeys that correspond to the food types, such as Vegetable, Fruit and Grain. In the summer, the rows in the Vegetable partition might be very busy (becoming a so-called “hot” partition). The service can load balance the Food table by moving the Vegetable partition to a different server to 
better handle the many requests made to the partition. If you anticipate more activity on that partition than a single server can handle, you should consider creating more-granular partitions such as Vegetable_Root and Vegetable_SquashRowKey :Define the uniqueness within a partitionExample how define RowKey: In combination with PartitionKey, it can define uniqueness within a table for each row. For example, I know another Julie Lerman (truly I do). So the RowKey will be critical in differentiating us when we share a PartitionKey of lerman_julie. You can also use RowKey to help with sorting. So then, what would be useful RowKeys for Julie Lerman the elder (that’s me) and Julie Lerman the younger ? A GUID will certainly do the trick for identity, but it does nothing for searches or sorting. In this case, a combination of values would probably be best. What else differentiates us? We live on opposite ends of the United States, but locations can change so that’s not useful for a key. Certainly our date of birth is different (by more than 20 years) and that’s a static value. But there’s always the chance that another Julie Lerman with my birth date exists somewhere in the world and could land in my database—highly implausible, but not impossible. After all of the deliberation I might go through, birth date may still not be a value on which my application is searching or sortingServe as sort keys: returned entities sorted by PartitionKey and then RowKey
  • How does the PartitionKey help system scale ? When the cloud sees two partitions - Action and Animation - are used a lot, it will take a dedicated server to contains all entities with only these tow partition (ServerA) and bring the rest to another server (ServerB)
  • EGT:This allows the application to atomically perform multiple Create/Update/Delete operations across multiple entities in a single batch request to the storage system, as long as all the entities have the same partition key value and are in the same table. Either all the entity operations succeed in the single transaction or they all fail, and snapshot isolation is provided for the execution of the transaction. Continuation Tokens:Why ?The number of entities is greater than the maximum number of entities allowed in the response by the server (currently 1000).The total size of the entities in the response is greater than the maximum size of a response (currently 4MB including the property names but excluding the xml tags used for REST).The query executed for more than the server defined timeout (currently 60 seconds).How ?The response includes a continuation token as custom headers. For a query over your entities, the custom headers representing a continuation token are : x-ms-continuation-NextPartitionKey x-ms-continuation-NextRowKeyThe client should pass both these values, if they exist, back into the next query as HTTP query options, with the rest of the query remaining the same. The client will then get the next set of entities starting at the continuation token.The next query looks as followshttp://&lt;serviceUri&gt;/Blogs?&lt;originalQuery&gt;&amp;NextPartitonKey=&lt;someValue&gt;&amp;NextRowKey=&lt;someOtherValue&gt;The continuation token must be treated as an opaque value. It reflects the starting point for the next query, and may not correspond to an actual entity in the table. If a new entity is added such Key(new entity) &lt; “Continuation token”, then this new entity will not be returned when the query is reissued using the continuation token. However, new entities added such that Key(new entity) &gt; Continuation token, will be returned in a subsequent query issued using the continuation token.
  • Not support other query operators : Select (All property must be retrieved, Projection is not supported), Group by, Order by, Min, Max …Cross-Table Consistency : In the MicroBlogging example, we had two tables – Channels and Blogs. The application is responsible for maintaining the consistency between the Channels table and the Blogs table. For example, when a channel is removed from the Channels table, the application should delete the corresponding blogs from the Blogs table. Failures can occur while the application is synchronizing the state of multiple tables. The application needs to be designed to handle such failures, and should be able to resume from where it left off.
  • The figure above illustrates a simple but common scenario for cloud applications: There are a set of web servers hosting the frontend logic of handling web requests. There are a set of backend processing servers implementing the business logic of the application. The web server frontend nodes communicate with the backend processing nodes via a set of queues. Persistent state of the application can be stored in Windows Azure Blob storage and Windows Azure Table storage. Example: consider an online video hosting service application as an example. Users can upload videos to this application; the application can then automatically convert the video file into different media formats and store them; in addition, the application will automatically index the description information of the video so that they can be easily searched on (e.g. based on keywords in the descriptions, actors, directors, title, and so on). Such an application can use the architecture described earlier. The web frontends implement the presentation layer and handle web requests from users. Users may upload videos via the web frontends. The video media files can be stored as blobs inside the blob store. The application may also maintain a set of tables to keep track of the video files it has as well as maintaining the indexes used for search. The backend processing servers are responsible for converting the input video files into different formats and store them into the blob storage. The backend servers are also responsible for updating the tables for this application in the table storage. Once the frontend servers receive a user request (e.g. a video upload request), they can generate a work item and push it into the request queue. The backend servers can then take these work items off the queue and process them accordingly On successful processing of each work item, the backend server should delete it from the queue so as to avoid duplicate processing by another backend server.
  • Scalability The use of queues decouples different parts of the application, making it easier to scale different parts of the application independently. In this example, the frontends and the backend servers are decoupled, and they communicate via the queues. This allows the number of backend servers and the number of frontend servers to be adjusted independently without affecting the application logic. This allows an application to easily scale out the critical components by adding more resources/machines to them.The use of queues allows the flexibility of efficient resource usage within an application, allowing the application to scale more efficiently. That is, separate queues can be used for work items of different priorities and/or different weights, and separate pools of backend servers can process these different queues. In this way, the application can allocate appropriate resources (e.g. in term of the number of servers) in each pool, thereby efficiently use the available resources to meet the traffic needs of different characteristics. For example, work items that are mission critical can be put into a separate queue, so that they can be processed earlier without having to wait for other work to complete. In addition, work items that will consume a large amount of resources (such as video conversion) may use their own queue. Different pools of backend servers can be used to process work items in each of these queues. The application can adjust the size of each of these pools independently according to the traffic it receives.Decoupling Front-End Roles from Back-End Roles - Different parts of the application are decoupled due to the use of queues, which allows significant flexibility and extensibility of how the application can be built. The messages in the queue can be in a standard and extensible format, such as XML, so that the components communicating at both ends of the queue do not have dependency on each other as long as they can understand the messages in the queue. Different technologies and programming language can be used to implement different parts of the system with maximum flexibility.Furthermore, changes within a component are transparent to the rest of the system. For example, a component can be re-written using a totally different technology or programming language, and the system still works seamlessly without changing the other components, since the components are decoupled using queues.Traffic Bursts : Queue provides buffering to absorb traffic bursts and reduce the impact of individual component failures. In the earlier example, there can be occasions where a burst of requests arrive in a short interval. The backend servers cannot quickly process all the requests. In this case, instead of dropping the requests, the requests are buffered in the queue, and the backend servers can process those at their own pace and eventually catch up. This allows the application to handle bursty traffic without losing availability.
  • Consider the following sequences of operations:1. C1 dequeues a message off the queue. This operation will return message 1, and make message 1 invisible in the queue for 30 seconds (we assume in this example that the default VisibilityTimeout is used, which is 30 seconds).2. Then C2 comes in and dequeues another message off the queue. Since message 1 is still invisible, this operation will not see message 1 and return message 2 back to C2.3. When C2 completes processing of message 2, it calls Delete to remove message 2 from the queue. 4.Now let us assume that C1 crashes and does not complete processing message 1 before it dies, and therefore the message was also not deleted by C1. 5. After message 1 has passed its VisibilityTimeout, it will reappear on the queue.6. After message 1 reappears on the queue, a later dequeue call from C2 will be able to retrieve it. It will then process message 1 to completion, and then delete it from the queue.
  • QueueThere is no limit on the number of messages stored in a queue. A message is stored for at most a week. The system will garbage collect the messages that are more than a week old.Queues can have metadata associated with them. Metadata is in the form of &lt;name, value&gt; pairs, and they are up to 8KB in size per queue.MessagesEach message can be up to 8KB in size. Note that when you put a message into the store, the message data can be binary. But when you get the messages back from the store, the response is in XML format, and the message data is returned as base64 encoded.There is no guaranteed return order of the messages from a queue, and a message may be returned more than once. Definitions of some parameters used by Azure Queue Service areMessageID: A GUID value that identifies the message in the queueVisibilityTimeout: An integer value that specifies the message&apos;s visibility timeout in seconds. The maximum value is 2 hours. The default message visibility timeout is 30 seconds. PopReceipt: A string which is returned for every message retrieved getting a message. This string, along with the MessageID, is required in order to delete a message message from the Queue. This should be treated as opaque, since its format and contents can change in the future.MessageTTL: This specifies the time-to-live interval for the message, in seconds. The maximum time-to-live allowed is 7 days. If this parameter is omitted, the default time-to-live is 7 days. If a message is not deleted from a queue within its time-to-live, then it will be garbage collected and deleted by the storage sytem.
  • Windows Azure Blob enables applications to store large objects, up to 200GB each in the cloud. It supports a massively scalable blob system, where hot blobs will be served from many servers to scale out and meet the traffic needs of your application. The Azure Blob system is highly available and durable. You can always access your data from anywhere at any time, and the data is replicated at least 3 times for durability.In addition, strong consistency is provided to ensure that the object is immediately accessible once it is added or updated; a subsequent read will immediately see the changes made from a previously committed write
  • Storage AccountAn account can have many Blob ContainersThis is the highest level of the namespace for accessing blobsBlob ContainerA container provides a grouping of a set of blobs.The container name is scoped by the account (duplicated name can appear in different accounts)Sharing policies are set at the container level. Currently &quot;Public READ&quot; and &quot;Private&quot; are supported.When a container is &quot;Public READ&quot;, all its contents can be read by anyone without requiring authentication. When a container is &quot;Private&quot;, only the owner of the corresponding account can access the blobs in that container with authenticated access.Containers can also have metadata associated with them. Metadata is in the form of &lt;name, value&gt; pairs, and they are up to 8KB in size per container.Blob Blobs are stored in and scoped by Blob Containers. You can upload a blob up to 64MB in size using a single PUT blob request up into the cloud. Each blob can be up to 200GB. To go up to the 200GB blob size limit, one must use the block interface.A blob has a unique string name within the container.Blobs can have metadata associated with them, which are &lt;name, value&gt; pairs, and they are up to 8KB in size per blob. The blob metadata can be gotten and set separately from the blob data bits.Page (refer Windows Azure Drive whitepaper): optimized for random read/write operations and provide the ability to write to a range of bytes in a blob. Page blobs are available only with version 2009-09-19. Page blobs are a collection of pages. A page is a range of data that is identified by its offset from the start of the blob. All pages must align 512-byte page boundaries. Unlike writes to block blobs, writes to page blobs happen in-place and are immediately committed to the blob. The maximum size for a page blob is 1 TB. A page written to a page blob may be up to 1 TB in size. Block: optimized for streaming. This type of blob is the only blob type available with versions prior to 2009-09-19. It can be a maximum of 4 MB in size and be uploaded into contiguous blocks to store a single blob. Each block has a unique ID/name (64 bytes in size) and this unique ID is scoped by the blob name being uploaded. For example, the first block could be called “Block 0001”, the second block “Block 0002”, etc. After all of the blocks are stored in Windows Azure Storage, then we commit the list of uncommitted blocks uploaded to represent the blob name they were associated with. This is done with a PUT specifying the URL above with the query specifying that this is a blocklist command.When this operation succeeds, the list of blocks, in the order in which they were listed, now represents the readable version of the blob. The blob can then be read using the GET blob commands
  • UseCloudBerryExplorer or Visual Studio to browse the blob because it supports viewing folder in blob service
  • Windows Azure Storage Services

    1. 1. Windows Azure storage services<br />Duy Lam<br />
    2. 2. Cloud computing<br />2<br />
    3. 3. Windows Azure platform<br />Agroup of cloud technologies, each providing a specific set of services to application developers<br />Windows Azure Platform Offers<br />3<br />
    4. 4. Windows Azure<br />“Compute” to run Windows applications<br />“Storage” to store the data in the cloud<br />4<br />
    5. 5. Compute<br />5<br />
    6. 6. Storage<br />6<br /><http|https>://<account-name>.<service><resource-path><br />blob<br />queue<br />table<br />
    7. 7. This “tables” is not relational tables. It’s structured storage and flexible schema: form of Tables, which contain a set of Entities, which contains a set of named Properties<br />Work with LINQ, WCF (ADO .NET) Data Services and REST<br />Practice:<br />Visual Studio Project Template<br />“Hello world” Window Azure Table<br />7<br />Tables Service<br />
    8. 8. How to debug<br />Observe underline REST request with Fiddler: DevelopmentStorageProxyUri=http://ipv4.fiddler<br />8<br />
    9. 9. Fun with properties<br />Timestamp property:<br />Read only<br />Optimistic Concurrency<br />PartitionKey & RowKey properties<br />Provide the uniqueness for an entity in a table<br />Scale the table<br />Sorting<br />9<br />
    10. 10. 10<br />Partitioning<br />
    11. 11. 11<br />Partitioning<br />
    12. 12. Other features<br />Entity group transactions (EGT)<br />Atomically manipulate entities in same partition in a single transaction<br />100 commands in a single transaction and payload < 4 MB<br />Continuation Tokens<br /><ul><li>x-ms-continuation-NextPartitionKey
    13. 13. x-ms-continuation-NextRowKey</li></ul>12<br />
    14. 14. Remember<br />Support query operators: From, Where, Take, First, FirstOrDefault<br />Cross-Table Consistency<br />13<br />
    15. 15. 14<br />Queues Service<br />
    16. 16. Benefits<br />15<br />
    17. 17. How it works<br />16<br />VisibilityTimeout<br />
    18. 18. Notes<br />Queue<br />No limit on the number of messages<br />A message is stored for at most a week<br />Can have metadata associated in the form of <name, value> pairs(up to 8KB in size per queue)<br />Messages<br />Up to 8KB in size<br />VisibilityTimeout: 30 seconds by default, 2 hours maximum<br />PopReceipt:is required when deleting a message<br />No guaranteed return order of the messages from a queue<br />17<br />
    19. 19. 18<br />
    20. 20. 19<br />Blob Service<br />
    21. 21. Models<br />20<br />
    22. 22. Notes<br />Use directory-like hierarchy for the blob names and then list all the "directories“ by using CloudBlobContainer.GetDirectoryReference() method<br />Permission on blob containers<br />21<br />Action/Rocky1.wmv<br />Action/Rocky2.wmv<br />Action/Rocky3.wmv<br />var storageAccount = CloudStorageAccount.FromConfigurationSetting("DataConnection")<br />.CreateCloudBlobClient().GetContainerReference("movies"); <br />var blobContainer = storageAccount.CreateCloudBlobClient()<br />.GetContainerReference("movies");<br />blobContainer.CreateIfNotExist();<br />var blobPermissions = blobContainer.GetPermissions();<br />blobPermissions.PublicAccess = BlobContainerPublicAccessType.Container;<br />blobContainer.SetPermissions(blobPermissions);<br />
    23. 23. 22<br />
    24. 24. Q & A<br />Thank you<br />
    25. 25. To start with<br />Install Windows Azure SDK<br />Microsoft Whitepapers:<br />Introducing the Windows Azure Platform<br />Windows Azure Table – Programming Table Storage<br />Windows Azure Blob – Programming Blob Storage<br />Windows Azure Queue - Programming Queue Storage<br />Windows Azure Tables and Queues Deep Dive - PDC09<br />Windows Azure Blob and Drive Deep Dive - PDC09<br />Neil Mackenzie blog<br />24<br />