• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Azure: Lessons From The Field
 

Azure: Lessons From The Field

on

  • 6,048 views

This is a presentation I delivered at CodeMash 2.0.1.0 dealing with lessons learned while building an application for handling the post-processing of scientific data using the Windows Azure platform.

This is a presentation I delivered at CodeMash 2.0.1.0 dealing with lessons learned while building an application for handling the post-processing of scientific data using the Windows Azure platform.

Statistics

Views

Total Views
6,048
Views on SlideShare
5,915
Embed Views
133

Actions

Likes
2
Downloads
0
Comments
0

9 Embeds 133

http://rob.gillenfamily.net 93
http://www.slideshare.net 24
http://www.gillenfamily.net 6
http://speakerrate.com 5
http://gyanmoti.wordpress.com 1
http://robgillen.me 1
http://gillenfamily.net 1
http://rgillen.squarespace.com 1
http://rob-gillen.squarespace.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Azure: Lessons From The Field Azure: Lessons From The Field Presentation Transcript

    • Lessons from the Field:
      Azure for Science
      Rob Gillen
      gillenre@ornl.gov
      rob.gillenfamily.net
      @argodev
    • Agenda
      Introductions
      • Why is ORNL looking at Cloud Computing
      • Azure in 5 minutes
      Post-Processing and Data Distribution in the Cloud
      • Using Cloud Computing for Post-Processing
      • Data hosting/distribution
      Lessons (being) Learned
      • General Lessons
      • Performance
    • Oak Ridge National Laboratory is DOE’s largest science and energy lab
      • World’s most powerful open scientific computing facility
      • Nation’s largest concentrationof open source materials research
      • $1.6B budget
      • 4,350 employees
      • 3,900 researchguests annually
      • $350 million investedin modernization
      • Nation’s most diverse energy portfolio
      • Operating the world’s most intense pulsed neutron source
      • Managing the billion-dollar U.S. ITER project
    • Delivering science and technology
      Ultrascale computing
      Energy technologies
      Bioenergy
      ITER
      Neutron sciences
      Climate
      Materials at the nanoscale
      National security
      Nuclear energy
    • UltrascaleScientific Computing
      • Leadership Computing Facility:
      • World’s most powerful open scientific computing facility
      • Peak speed of 2.33 petaflops (> two thousand trillion calculations/sec)
      • 18,688 nodes, 224,526 compute cores, 299 TB RAM, 10,000 TB Disk
      • 4,352 ft2 floor space
      • Exascale system by the end of the next decade
      • Focus on computationally intensive projects of large scale and high scientific impact
      • Addressing key science and technology issues
      • Climate
      • Fusion
      • Materials
      • Bioenergy
      • Home of the 1st and 3rd fastest super computers in the world.
      The world’s most powerful system for open science
    • Then Why Look at Cloud Computing???
      Science Takes Different Forms
      • Tight Simulations
      • Data-Parallelized
      • Embarrassingly Parallel
      Dearth of Mid-Range Assets
      • 256-1,000 cores
      • 1 of many possible solutions
      Scaling Issues
      • Power Consumption
      • Programming Struggles
      • Fault-Tolerance
      Forward-Looking
      • Next-Generation Problems
      • Next-Generation Researchers
    • Private
      (On-Premise)
      Infrastructure
      (as a Service)
      Platform
      (as a Service)
      Types of Clouds
      You manage
      Applications
      Applications
      Applications
      You manage
      Runtimes
      Runtimes
      Runtimes
      Security & Integration
      Security & Integration
      Security & Integration
      Managed by vendor
      Databases
      Databases
      Databases
      You manage
      Servers
      Servers
      Servers
      Managed by vendor
      Virtualization
      Virtualization
      Virtualization
      Server HW
      Server HW
      Server HW
      Storage
      Storage
      Storage
      Networking
      Networking
      Networking
    • Private
      (On-Premise)
      Types of Clouds
      Infrastructure
      (as a Service)
      Platform
      (as a Service)
    • Application Services
      “Dublin”
      “Velocity”
      Frameworks
      “Geneva”
      Security
      Access Control
      Project “Sydney”
      Connectivity
      Service Bus
      SQL Azure Data Sync
      Data
      Compute
      Windows Azure Platform
      Table Storage
      Blob Storage
      Queue
      Drive
      Content Delivery Network
      Storage
    • Windows Azure Compute
      Development, service hosting, & management environment
      .NET, Java PHP, Python, Ruby, native code (C/C++, Win32, etc.)
      ASP.NET providers, FastCGI, memcached, MySQL, Tomcat
      Full-trust – supports standard languages and APIs
      Secure certificate store
      Management API’s, and logging and diagnostics systems
      Multiple roles – Web, Worker, Virtual Machine (VHD)
      Multiple VM sizes
      1.6 GHz CPU x64, 1.75GB RAM, 100Mbps network, 250GB volatile storage
      Small (1X), Medium (2X), Large (4X), X-Large (8X)
      In-place rolling upgrades, organized by upgrade domains
      Walk each upgrade domain one at a time
      Compute
    • Windows Azure Diagnostics
      Configurable trace, performance counter, Windows event log, IIS log & file buffering
      Local data buffering quota management
      Query & modify from the cloud and from the desktop per role instance
      Transfer to storage scheduled & on-demand
      Filter by data type, verbosity & time range
      Compute
    • Windows Azure Storage
      Rich data abstractions – tables, blobs, queues, drives, CDN
      Capacity (100TB), throughput (100MB/sec), transactions (1K req/sec)
      High accessibility
      Supports geo-location
      Language & platform agnostic REST APIs
      URL: http://<account>.<store>.core.windows.net
      Client libraries for .NET, Java, PHP, etc.
      High durability – data is replicated 3 times within a cluster, and (Feb 2010) across datacenters
      High scalability – data is automatically partitioned and load balanced across servers
      Storage
      Storage
    • Windows Azure Table Storage
      Designed for structured data, not relational data
      Data definition is part of the application
      A Table is a set of Entities (records)
      An Entity is a set of Properties (fields)
      No fixed schema
      Each property is stored as a <name, typed value> pair
      Two entities within the same table can have different properties
      No schema is enforced
      Table Storage
    • Windows Azure Blob Storage
      Storage for large, named files plus their metadata
      Block Blob
      Targeted at streaming workloads
      Each blob consists of a sequence of blocks
      Each block is identified by a Block ID
      Size limit 200GB per blob
      Page Blob
      Targeted at random read/write workloads
      Each blob consists of an array of pages
      Each page is identified by its offset from the start of the blob
      Size limit 1TB per blob
      Blob Storage
    • Windows Azure Queue
      Performance efficient, highly available and provide reliable message delivery
      Asynchronous work dispatch
      Inter-role communication
      Polling based model; best-effort FIFO data structure
      Queue operations
      Create Queue
      Delete Queue
      List Queues
      Get/Set Queue Metadata
      Message operations
      Add Message
      Get Message(s)
      Peek Message(s)
      Delete Message
      Queue
    • Windows Azure Drive
      Provides a durable NTFS volume for Windows Azure applications to use
      Use existing NTFS APIs to access a durable drive
      Durability and survival of data on application failover
      Enables migrating existing NTFS applications to the cloud
      Drives can be up to 1TB; a VM can dynamically mount up to 8 drives
      A Windows Azure Drive is a Page Blob
      Example, mount Page Blob as X:
      http://<account>.blob.core.windows.net/<container>/<blob>
      All writes to drive are made durable to the Page Blob
      Drive made durable through standard Page Blob replication
      Drive
    • Windows Azure Content Delivery Network
      Provides high-bandwidth global blob content delivery
      18 locations globally (US, Europe, Asia, Australia and South America), and growing
      Blob service URL vs. CDN URL
      Blob URL: http://<account>.blob.core.windows.net/
      CDN URL: http://<guid>.vo.msecnd.net/
      Support for custom domain names
      Access details
      Blobs are cached in CDN until the TTL passes
      Use per-blob HTTP Cache-Control policy for TTL (new)
      CDN provides only anonymous HTTP access
      Content Delivery Network
    • Tenants of Internet-Scale Application Architecture
      Design
      • Horizontal scaling
      • Service-oriented composition
      • Eventual consistency
      • Fault tolerant (expect failures)
      Security
      • Claims-based authentication & access control
      • Federated identity
      • Data encryption & key mgmt.
      Management
      • Policy-driven automation
      • Aware of application lifecycles
      • Handle dynamic data schema and configuration changes
      Data & Content
      • De-normalization
      • Logical partitioning
      • Distributed in-memory cache
      • Diverse data storage options (persistent & transient, relational & unstructured, text & binary, read & write, etc.)
      Processes
      • Loosely coupled components
      • Parallel & distributed processing
      • Asynchronous distributed communication
      • Idempotent (handle duplicity)
      • Isolation (separation of concerns)
    • Application Goals
      Simulate Post-Processing of Scientific Data
      • Generate Visualizations from “raw” data
      • Transform data to be consumable by general processes
      • Exercise various storage mechanisms
      Focus on Mechanics
      • The specific science problem being solved is secondary to the approach
      • Goal is to refine approach such that it can fade allowing the science to regain preeminence
    • Putting Data Into the Cloud
      Source Data
      • NetCDF files – subset of US contribution to CMIP3 archive
      Visualization Support
      • Flatten Source Files to CSV
      • Generate base “heat map”
      • Combine heat map and base map
      • Generate Video/Animation
      General Consumption/Publishing
      • Expose data as a “service” (REST/XML/JSON, etc.)
      • Query-able
      • Azure Tables (OGDI) / Azure Blob
    • Application Patterns
      Grid / Parallel Computing Application
      User
      Silverlight
      Application
      Web Browser
      Mobile
      Browser
      WPF
      Application
      ASP.NET
      (Web Role)
      Web Svc
      (Web Role)
      Jobs
      (Worker Role)
      ASP.NET
      (Web Role)
      ASP.NET
      (Web Role)
      ASP.NET
      (Web Role)
      ASP.NET
      (Web Role)
      ASP.NET
      (Web Role)
      ASP.NET
      (Web Role)
      Private Cloud
      Public Services
      ASP.NET
      (Web Role)
      ASP.NET
      (Web Role)
      ASP.NET
      (Web Role)
      ASP.NET
      (Web Role)
      ASP.NET
      (Web Role)
      ASP.NET
      (Web Role)
      Enterprise Application
      Application
      Service
      Enterprise Web Svc
      Data
      Service
      Table Storage
      Service
      Blob Storage
      Service
      Queue
      Service
      Enterprise Data
      Storage
      Service
      Identity
      Service
      Enterprise Identity
      Service Bus
      Access Control Service
      Workflow
      Service
      User
      Data
      Application Data
      Reference Data
    • Flatten
      NetCDF
      Generate Image
      Table
      Loader
      Application Flow
      Message From Q
      Message From Q
      Message From Q
      Download Binary File
      Download CSV
      Download CSV
      For each Time Period…
      Generate Image
      Read In Rows
      Flatten to CSV (memory)
      Size Image
      For each Set of 100…
      Upload to Blob Storage
      Upload to Blob Storage
      Submit Batch To Table
      Queue Table Load Job
      Combine with Overlay
      Queue Gen Image Job
      Upload to Blob Storage
      Period in Lookup Table
    • Context
      35 TB of numbers – How Much Data Is That?
      • A single lat/lon map at typical climate model resolution represents ~40 KB
      • If you wanted to look at all 35 TB in the form of these lat/lon plots and if…
      • Every 10 seconds you displayed another map
      • You worked 24 hours/day, 365 days/year
      • You could complete the task in about 200 years.
      Dataset Used
      • 1 NetCDF file, approximately 92 MB, located in blob storage
      • 1,825 CSV files generated.
      • 815.84 MB total
      • Average file size is around 457.76 KB
      • Each CSV represented 12,690 data points (lat/lon/temp)
      • 3,650 images generated
      • 145.03 MB total
      • Heat Maps avg. 31.25 KB
      • Combined images avg. 49 KB
      • 23,652,000 entities added to azure table
    • Lessons
      Performance Counters
      • Take advantage of the new logging infrastructure within Azure to understand how your application is behaving.
      • However, like food at the dinner table, only take what you can eat.
    • Flatten Operation – Proc utilization ~16% during active work
    • Image Generation – Proc utilization ~95% during active work
    • Table Load – Proc utilization ~57% during active work
    • Table Load – Proc utilization ~57% during active work
    • Lessons
      Performance Counters
      • Take advantage of the new logging infrastructure within Azure to understand how your application is behaving.
      • However, like food at the dinner table, only take what you can eat.
      Tracing Infrastructure
      • Huge improvements from CTP to v1
      • Use categories to filter / limit what you transfer out
      • My eyes were bigger than my stomach
      Table Maintenance
      • (nodes * counters) + (nodes * trace) == lots of data
      • Plan early for how you are going to maintain Wad* tables.
      • Remember… redundancy/availability has a cost. (Perf)
    • Flatten: CSV Upload Time
      Over 40,349 attempts, 249.99 ms (79.12ms) with a rate of 15.63 mb/s (4.74).
      Avg File size: 457.76 KB
    • Flatten: CSV Upload Rate
      Over 40,349 attempts, 249.99 ms (79.12ms) with a rate of 15.63 mb/s (4.74).
      Avg File size: 457.76 KB
    • Flatten: Queue Insert Duration
      Over 40,345 attempts, given a msg size of 616b, insertion time averaged
      254.96 ms (68.86)
    • Flatten: Single Table Entity Insert
      Over 40,353 attempts, average insertion time of 248.63 ms (108.16)
    • ImageGen: CSV File Download Duration
      Over 40,349 attempts, 249.99 ms (79.12ms) with a rate of 15.63 mb/s (4.74).
      Avg File size: 457.76 KB
    • ImageGen: CSV File Download Rate
      Over 40,349 attempts, 249.99 ms (79.12ms) with a rate of 15.63 mb/s (4.74).
      Avg File size: 457.76 KB
    • ImageGen: Image Generation and Resizing
      Over 24,687 attempts, average generation time was 3.7s (0.283s)
    • ImageGen: Image File Upload Duration
      Over 24,688 attempts, 88.14ms (44.84ms) with a rate of 3.02 mb/s (0.614).
      Avg File size: 32 KB
    • ImageGen: Image File Upload Rate
      Over 24,688 attempts, 88.14ms (44.84ms) with a rate of 3.02 mb/s (0.614).
      Avg File size: 32 KB
    • TableLoad: Batch Insert Rate
      Over 89,202 batches (100 records each), average duration was 1.447s (0.316s)
    • Lessons
      Data
      • Generic formats tend to be large (92 MB NetCDF 816 MB CSV)
      • Data transfer within Azure datacenter is fast (from your computer is slow)
      • Think about transport overhead (ATOM/JSON/CSV/etc. – 9x larger)
      • Use Asynccalls for data uploads/downloads (use your CPU cycles wisely – you are paying for them)
      Azure Tables
      • Inserts/Deletes are slow but relatively linear
      • Partition keys are not queryable… store them
      • Not well suited for “changing” data
      • If you are using the client library/ADO.NET Data Services, be careful of how you handle async calls – you can lose context
      • Use batch updates wherever possible (1 in 0.24863s or 100 in 1.447s) (6 individual updates take longer than 100 in a single batch.
    • Lessons
      General
      • Timeouts happen – Expect/Plan for them (exponential back-off & retry policies)
      • Design for Idempotency
      • Watch your compilation model (x86 vs. x64)
      • Data transfer within Azure datacenter is fast (from your computer is slow)
      • Don’t re-invent the wheel – use the available tools when practical
      • Powershell, PowerPivot, Logparser, and the NET Charting Libraries are your friend.
    • Thank you
      gillenre@ornl.gov
      rob.gillenfamily.net
      @argodev
    • The Microsoft Cloud
      Data Center Infrastructure
    • The Microsoft Cloud
      Data Center Infrastructure
    • The Microsoft Cloud
      ~100 Globally Distributed Data Centers
      Quincy, WA
      Chicago, IL
      San Antonio, TX
      Dublin, Ireland
      Generation 4 DCs