• Save
Windows Azure: Notes From the Field
Upcoming SlideShare
Loading in...5
×
 

Windows Azure: Notes From the Field

on

  • 2,779 views

Presented on September 14, 2009 to the HUNTUG group (http://huntug.org)

Presented on September 14, 2009 to the HUNTUG group (http://huntug.org)

Statistics

Views

Total Views
2,779
Views on SlideShare
2,669
Embed Views
110

Actions

Likes
0
Downloads
1
Comments
0

7 Embeds 110

http://rob.gillenfamily.net 91
http://www.gillenfamily.net 6
http://www.slideshare.net 5
http://robgillen.me 4
http://www.linkedin.com 2
http://rgillen.squarespace.com 1
http://www.robgillen.me 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • For updates to this content please download the latest Azure Services Platform Training Kit from: http://www.azure.com
  • This is the exploding cloud diagram
  • Windows Azure runs on Windows Server 2008 running .NET 3.5 SP1. At MIX09, we opened up support for Full Trust and FastCGI. Full Trust is starred here because while Full Trust gives you access to p/invoke into native code, it is code that still runs in user mode (not administrator). However, for most native code that is just fine. If you wanted to call into some Win32 APIs for instance, it might not work in all instances because we are not running your code under a system administrator account.There are 2 roles in playA web role – which is just a web site, asp.net, wcf, images, css etc.A worker role – which is similar to a windows service, it runs in the background and can be used to decouple processing. There is a diagram later that shows the architecture, so don’t worry about how it fits together just yet.Key to point out the inbound protocols are HTTP & HTTPS – outbound are any TCP Socket, (but not UDP).All servers are stateless, and all access if through load balancers.
  • This should give a short introduction to storage. Key points are its durable (meaning once you write something we write it to disk), scalable (you have multiple servers with your data), available (the same as compute, we make sure the storage service is always running – there are 3 instances of your data at all times).Quickly work through the different types of storage:Blobs – similar to the file system, use it to store content that changes, uploads, unstructured data, images, movies etc.Tables – Semi-structured, provides a partitioned entity store (more on partitions etc. in the Building Azure Services Talk) – allows you to have tables containing billions of rows, partitioned across multiple servers.Queues – Simple queue for decoupling Computer Web and Worker Roles.All access is through REST interface. You can actually access the storage from outside of the data center (you don’t need compute) and you can access storage via anything that can make a HTTP request.It also means table storage can be accesses via ADO.NET Data Services.
  • Remind them the cloud is all the hardware across the board.Point out the automated service management,
  • Developer SDK is a Cloud in a box, allowing you to develop and debug locally without requiring a connection to the cloud. You can do this without Visual Studio as there are command line tools for executing the “cloud in a box” and publishing to the cloud.There is also a separate download for the Visual Studio 2008 tools, which provide the VS debugging and templates.Requirements are any version of Visual Studio (including Web Developer Express), Vista SP1, Win7 RC or later.

Windows Azure: Notes From the Field Windows Azure: Notes From the Field Presentation Transcript

  • Windows Azure: Notes
    From The Field
    Rob Gillen
    Computer Science Research
    Oak Ridge National Laboratory
    Planet Technologies, Inc.
  • Agenda
    Introduction to Windows Azure
    Application Overview
    What didn’t work
    What is working (or, at least we think)
    Lessons (being) Learned
    Questions
  • About Planet Technologies
    Leader in integration and customization of Microsoft technologies, architecture, security, and management consulting
    100% Microsoft Focused Gold Partner
    Four-time Microsoft Federal Partner of the Year (05-08)
    Microsoft SLG Partner of the Year (08)
    Microsoft Public Sector Partner of the Year (06)
  • Oak Ridge National Laboratory is DOE’s largest science and energy lab
    • World’s most powerful open scientific computing facility
    • Nation’s largest concentrationof open source materials research
    • $1.3B budget
    • 4,350 employees
    • 3,900 researchguests annually
    • $350 million investedin modernization
    • Nation’s most diverse energy portfolio
    • Operating the world’s most intense pulsed neutron source
    • Managing the billion-dollar U.S. ITER project
  • Delivering science and technology
    Ultrascale computing
    Energy technologies
    Bioenergy
    ITER
    Neutron sciences
    Climate
    Materials at the nanoscale
    National security
    Nuclear energy
  • UltrascaleScientific Computing
    • Leadership Computing Facility:
    • World’s most powerful open scientific computing facility
    • Jaguar XT operating at 1.64 petaflops
    • Exascale system by the end of the next decade
    • Focus on computationally intensive projects of large scale and high scientific impact
    • Addressing key science and technology issues
    • Climate
    • Fusion
    • Materials
    • Bioenergy
    The world’s most powerful system for open science
  • Unique Network Connectivity
    10 GB, moving to 40 GB, and higher
  • Disclaimer
    Windows Azure is still in CTP. There are issues. They are making it better. This talk is simply about current experiences and hopefully some tips/pointers to help you reach success faster.
    The tests performed and referenced in this talk are not deemed scientifically accurate – simply what I have seen in my testing/usage.
    There are (many) people (much) smarter than me.
  • What is Windows Azure?
    Compute
    Storage
    Developer
    SDK
  • Developer
    Tools
    What is Windows Azure?
    Compute
    • .NET 3.5 SP1
    • Server 2008 – 64bit
    • Full Trust*
    • Web Role
    • IIS7 Web Sites (ASP.NET, FastCGI)
    • Web Services (WCF)
    • Worker Role
    • Stateless Servers
    • Http(s)
    Storage
  • Developer
    Tools
    What is Windows Azure?
    Storage
    • Durable, scalable, available
    • Blobs
    • Tables
    • Queues
    • REST interfaces
    • Can be used without compute
    Compute
  • What is Windows Azure?
    Compute
    Storage
    • All of the hardware
    • Hardware Load Balancers
    • Servers
    • Networks
    • DNS
    • Monitoring
    • Automated service management
    Developer
    Tools
  • What is Windows Azure?
    Developer SDK
    • Windows Azure SDK
    • Local compute environment
    • Local Mock Storage
    • Command line tools
    • Small Managed API
    • Logging, working storage
    • Microsoft Visual Studio 2008 add-in
    Compute
    Storage
  • Windows Azure Datacenter
    Your Service
    Service Architecture
    Worker Service
    Worker Service
    Internet
    LB
    Tables
    Storage
    Web Site
    (ASPX, ASMX, WCF)
    Web Site
    (ASPX, ASMX, WCF)
    Web Site
    (ASPX, WCF)
    Queue
    LB
    Blobs
  • Initial Context
    Studying the intersection of HPC/scientific computing and the cloud
    Data locality is expected to be a key issue for us
    Cloud Computing looks to fill a niche in pre- and post-processing as well as generalized mid-range compute
    This project is an introductory or preparatory step into the larger research project
  • Sample Application Goals
    Make CMIP3 data more accessible/consumable
    Prototype the use of cloud computing for post-processing of scientific data
    Answer the questions:
    Can cloud computing be used effectively for large-scale data
    How accessible is the programming paradigm
    Note: focus is on the mechanics, not the science (could be using number of foobars in the world rather than temp simulations)
  • Two-Part Problem
    Get the data into the cloud/exposed in such a way as to be consumable by generic clients in Internet-friendly formats
    Provide some sort of visualization or sample application to provide context/meaning to the data.
    Simply making the data available doesn’t solve much
    Looking at TB of date/lat/lon/temp combinations doesn’t convey much
    A visualization or sample application was required to make the data “grok-able”
  • Putting Data in the Cloud
    Source format - NetCDF is a hierarchical, n-dimensional binary format. Highly compressed and efficient. Difficult to consume in small bites over the Internet (often need to download the entire file or use OpenDAP)
    Libraries for interacting with NetCDF are available in C, Java, Ruby, Python, etc. Rudimentary managed wrapper available on CodePlex. File format is a hurdle for the casual observer (non-domain expert).
  • Putting Data in the Cloud
    Desire to expose data as a “service” (think REST, XML, JSON, etc.)
    Decided to store in Azure Tables as “flattened” view
    Designed to scale to billions of records
    Consumers can query and retrieve small slices of data
    Supports ADO.NET Data Services with no extra effort (ATOM)
  • Context: 35 Terabytes of numbers - How much data is that?
    A single latitude/longitude map at typical climate model resolution represents about ~40 KB.
    If you wanted to look at all 35 TB in the form of these latitude/longitude plots and if..
    Every 10 seconds you displayed another map and if
    You worked 24 hours a day 365 days each year,
    You could complete the task in about 200 years.
  • Dataset Used
    1.2 GB NetCDF file – NCAR climate of the 20th century, run 1, daily data, air temperature, 1.4 degree grid.
    40,149 days represented
    Each day has 8,192 temperature values
    Total of 328,900,608 unique values
    0.003428 % of total set
  • Data Load Approach #1
    Local application flattened NetCDF in memory, load records directly into Azure Tables using Entity Framework
    Initially 1 record at a time (prior to batch support)
    100 record batches (max/batch once batch support enabled)
    Worked, but took *forever* (collect this time)
  • Data Load Approach #2
    Local application flattened NetCDF into CSV files (one per time unit - ~41,000 files)
    CSV files uploaded into Azure blob storage
    Queue populated with individual entries for each time unit
    Workers roles would grab a time period from the queue, pull in the CSV, upload the data to the tables in 100-unit batches, and delete from the queue.
  • Data Load Approach #2
    Results
    Averaged 2:30/time period
    40,149 time periods
    24 per worker hour
    1,672.8 worker-hours
    14 active workers
    119.5 calendar hours
    ~5 days total load.
    328,900,608 total entities
    Near-linear scale out
    Remember, this is 0.003428 % of total set
  • Data Load Approach #3
    Similar to #2, but initial flattening to CSVs occurs in Azure rather than local machine
    Same table load performance as #2, but doesn’t require local machine resources for flattening and uploading
    Uploading a single 1.2GB NetCDF file is much faster than uploading ~40,100 300KB CSV files
  • Sample Visualization Application
    Goals
    Generate heat maps for each time slice
    Animate collection of heat maps
    Allow user to compare similar time frames from various experiments to understand impact of changes
  • Visualization Approach #1
    Silverlight-based app, using CTP Virtual Earth control
    Download data by time period, for each data point (lat, lon, temp), create a bounding square (polygon) and set the color on the VE control
    Downloaded via Entity Framework (easy to write)
    Downloaded via JSON (harder, but less verbose)
    Store datasets in memory, allow user to select between, animate downloaded sets, batch download
  • Virtualization Approach #1
    Results
    ATOM is *very* bloated (~9MB per time period, average of 55 seconds over 9 distinct, serial calls)
    JSON is better (average of 18.5 seconds and 1.6MB)
    Client image rendering is *ok*…
    Polygons prevented normal VE interaction
    When interaction occurred, it was jerky
  • Silverlight-based Client Processing
    Click to start
    Demo
  • Visualization Approach #1.5
    Attempted to go the whole “GIS” route and create a WMS or use MapCruncher
    Results
    Process worked OK, but was heavy/manually intensive.
    With the resolution of the data I was using, was interactivity valuable?
  • Visualization Approach #2
    Pre-generate the images for each time period
    Used fixed-size base map
    Pre-cache images
    Silverlight and WPF viewer would include WPF animation to cycle through image collection
  • Visualization Approach #2
    Results
    Image Generation worked fine (smoother than VE)
    Both Silverlight and WPF desktop app choked on animations when the number of images got large (i.e. > 100)
  • Visualization Approach #3
    Same approach as #2, but generate video (i.e. WMV)
    Results
    Significantly improved rendering performance
    Supports streaming
  • WPF Client Image Animation and pre-rendered video
    Demo
  • Sidebar: Generating Heatmaps
    Create an image using GDI+ and set the appropriate pixels to a shade of gray from 0-255
    Apply a color map that translates from a gray to a color in a reference image
    (Yes… you have to care about pixels…)
  • Sidebar: Generating Heat Maps
    Rudimentary math, but process intensive for generating each image. (There’s likely a better way…)
  • Current Application Workflow
    NetCDFfile (source) uploaded to blob storage
    NetCDFfile split into 1000’s of CSV files stored in blob storage
    Process generates a LoadTable command for each CSV created
    LoadTable workers process jobs and load CSV data into Azure Tables.
    Once a CSV file has been processed, a CreateImage job is created
  • Current Application Workflow
    CreateImage workers process queue, generating a heat map image for each time set
    Once all data is loaded and images are created, a video is rendered based on the resulting images and used for inclusion in visualization applications.
    Each source image is “munged” with a base map image prior to loading into the video.
  • Technologies Utilized
    Windows Azure (tables, blobs, queues, web roles, worker roles)
    OGDI (http://ogdisdk.cloudapp.net/)
    C#, F#, PowerShell, DirectX, SilverLight, WPF, Bing Maps (Virtual Earth), GDI+, ADO.NET Data Services
    http://sciencecloud.us/test/silverlightapplication1testpage.aspx
  • Lessons
    Cloud-focused data formats are large.
    Single ~1.2 GB NetCDF == ~16 GB of CSV
    Table load time is “slow”
    ~8,200 records, over 82 batches, average 2:30
    However, insert time remains linear
    Partition keys are not queryable… store them.
    Load times prevent Azure tables from being particularly well-suited for large-scale data
    Watch your compliation model (32 vs. 64 bit)
  • Lessons
    Errors happen… plan for/expect them
    Watch for timeouts when retrieving files, uploading data, etc. (Code Sample)
    Design for Idempotency
    multiple applications of the operation does not change the result
    Assume your worker roles will get restarted.
    Azure deployments will fail when you least want them to (remember, it’s a CTP).
    Stay away from dev storage (local fabric)
  • Lessons
    ATOM is convenient, but bloated – use JSON where possible
    Data transfer within Azure datacenters is fast. Use web roles to format/proxy data for transfer over the Internet
    Azure logs are very slow – use alternate reporting methods if faster feedback loop is necessary
  • Related Content
    Net CDF: http://www.unidata.ucar.edu/software/netcdf/
    Net CDF Wrapper for .NET: http://netcdf.codeplex.com/
    OPeNDAP: http://www.opendap.org/
    CMIP 3: http://www-pcmdi.llnl.gov/ipcc/about_ipcc.php
    Open Government Data Initiative: http://ogdisdk.cloudapp.net/
    JSON.NET: http://json.codeplex.com/
    Map Cruncher: http://www.microsoft.com/maps/product/mapcruncher.aspx
    Heat maps for VE: http://johanneskebeck.spaces.live.com/blog/cns!42E1F70205EC8A96!7742.entry?wa=wsignin1.0&sa=406128337
    Heat maps in C#: http://dylanvester.com/post/Creating-Heat-Maps-with-NET-20-%28C-Sharp%29.aspx
  • Related Content
    Silverlight 3 and Data Paging
    With ATOM: http://rob.gillenfamily.net/post/Silverlight-and-Azure-Table-Data-Paging.aspx
    With JSON: http://rob.gillenfamily.net/post/SilverLight-and-Paging-with-Azure-Data.aspx
    AtomPub, JSON, Azure, and Large Datasets
    Part 1: http://rob.gillenfamily.net/post/AtomPub-JSON-Azure-and-Large-Datasets.aspx
    Part 2: http://rob.gillenfamily.net/post/AtomPub-JSON-Azure-and-Large-Datasets-Part-2.aspx
  • Questions
    Rob Gillen
    Email: rob@gillenfamily.net
    Blog: http://rob.gillenfamily.net
    Twitter: @argodev