This is a presentation I delivered at CodeMash 2.0.1.0 dealing with lessons learned while building an application for handling the post-processing of scientific data using the Windows Azure platform.
35. Application Services “Dublin” “Velocity” Frameworks “Geneva” Security Access Control Project “Sydney” Connectivity Service Bus SQL Azure Data Sync Data Compute Windows Azure Platform Table Storage Blob Storage Queue Drive Content Delivery Network Storage
36. Windows Azure Compute Development, service hosting, & management environment .NET, Java PHP, Python, Ruby, native code (C/C++, Win32, etc.) ASP.NET providers, FastCGI, memcached, MySQL, Tomcat Full-trust – supports standard languages and APIs Secure certificate store Management API’s, and logging and diagnostics systems Multiple roles – Web, Worker, Virtual Machine (VHD) Multiple VM sizes 1.6 GHz CPU x64, 1.75GB RAM, 100Mbps network, 250GB volatile storage Small (1X), Medium (2X), Large (4X), X-Large (8X) In-place rolling upgrades, organized by upgrade domains Walk each upgrade domain one at a time Compute
37. Windows Azure Diagnostics Configurable trace, performance counter, Windows event log, IIS log & file buffering Local data buffering quota management Query & modify from the cloud and from the desktop per role instance Transfer to storage scheduled & on-demand Filter by data type, verbosity & time range Compute
38. Windows Azure Storage Rich data abstractions – tables, blobs, queues, drives, CDN Capacity (100TB), throughput (100MB/sec), transactions (1K req/sec) High accessibility Supports geo-location Language & platform agnostic REST APIs URL: http://<account>.<store>.core.windows.net Client libraries for .NET, Java, PHP, etc. High durability – data is replicated 3 times within a cluster, and (Feb 2010) across datacenters High scalability – data is automatically partitioned and load balanced across servers Storage Storage
39. Windows Azure Table Storage Designed for structured data, not relational data Data definition is part of the application A Table is a set of Entities (records) An Entity is a set of Properties (fields) No fixed schema Each property is stored as a <name, typed value> pair Two entities within the same table can have different properties No schema is enforced Table Storage
40. Windows Azure Blob Storage Storage for large, named files plus their metadata Block Blob Targeted at streaming workloads Each blob consists of a sequence of blocks Each block is identified by a Block ID Size limit 200GB per blob Page Blob Targeted at random read/write workloads Each blob consists of an array of pages Each page is identified by its offset from the start of the blob Size limit 1TB per blob Blob Storage
41. Windows Azure Queue Performance efficient, highly available and provide reliable message delivery Asynchronous work dispatch Inter-role communication Polling based model; best-effort FIFO data structure Queue operations Create Queue Delete Queue List Queues Get/Set Queue Metadata Message operations Add Message Get Message(s) Peek Message(s) Delete Message Queue
42. Windows Azure Drive Provides a durable NTFS volume for Windows Azure applications to use Use existing NTFS APIs to access a durable drive Durability and survival of data on application failover Enables migrating existing NTFS applications to the cloud Drives can be up to 1TB; a VM can dynamically mount up to 8 drives A Windows Azure Drive is a Page Blob Example, mount Page Blob as X:br />http://<account>.blob.core.windows.net/<container>/<blob> All writes to drive are made durable to the Page Blob Drive made durable through standard Page Blob replication Drive
43. Windows Azure Content Delivery Network Provides high-bandwidth global blob content delivery 18 locations globally (US, Europe, Asia, Australia and South America), and growing Blob service URL vs. CDN URL Blob URL: http://<account>.blob.core.windows.net/ CDN URL: http://<guid>.vo.msecnd.net/ Support for custom domain names Access details Blobs are cached in CDN until the TTL passes Use per-blob HTTP Cache-Control policy for TTL (new) CDN provides only anonymous HTTP access Content Delivery Network
67. Flatten NetCDF Generate Image Table Loader Application Flow Message From Q Message From Q Message From Q Download Binary File Download CSV Download CSV For each Time Period… Generate Image Read In Rows Flatten to CSV (memory) Size Image For each Set of 100… Upload to Blob Storage Upload to Blob Storage Submit Batch To Table Queue Table Load Job Combine with Overlay Queue Gen Image Job Upload to Blob Storage Period in Lookup Table
68.
69. If you wanted to look at all 35 TB in the form of these lat/lon plots and if…