Azure: Lessons From The Field

  • 4,743 views
Uploaded on

This is a presentation I delivered at CodeMash 2.0.1.0 dealing with lessons learned while building an application for handling the post-processing of scientific data using the Windows Azure platform.

This is a presentation I delivered at CodeMash 2.0.1.0 dealing with lessons learned while building an application for handling the post-processing of scientific data using the Windows Azure platform.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,743
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Lessons from the Field:
    Azure for Science
    Rob Gillen
    gillenre@ornl.gov
    rob.gillenfamily.net
    @argodev
  • 2. Agenda
    Introductions
    • Why is ORNL looking at Cloud Computing
    • 3. Azure in 5 minutes
    Post-Processing and Data Distribution in the Cloud
    • Using Cloud Computing for Post-Processing
    • 4. Data hosting/distribution
    Lessons (being) Learned
    • General Lessons
    • 5. Performance
  • Oak Ridge National Laboratory is DOE’s largest science and energy lab
    • World’s most powerful open scientific computing facility
    • 6. Nation’s largest concentrationof open source materials research
    • 7. $1.6B budget
    • 8. 4,350 employees
    • 9. 3,900 researchguests annually
    • 10. $350 million investedin modernization
    • 11. Nation’s most diverse energy portfolio
    • 12. Operating the world’s most intense pulsed neutron source
    • 13. Managing the billion-dollar U.S. ITER project
  • Delivering science and technology
    Ultrascale computing
    Energy technologies
    Bioenergy
    ITER
    Neutron sciences
    Climate
    Materials at the nanoscale
    National security
    Nuclear energy
  • 14. UltrascaleScientific Computing
    • Leadership Computing Facility:
    • 15. World’s most powerful open scientific computing facility
    • 16. Peak speed of 2.33 petaflops (> two thousand trillion calculations/sec)
    • 17. 18,688 nodes, 224,526 compute cores, 299 TB RAM, 10,000 TB Disk
    • 18. 4,352 ft2 floor space
    • 19. Exascale system by the end of the next decade
    • 20. Focus on computationally intensive projects of large scale and high scientific impact
    • 21. Addressing key science and technology issues
    • 22. Climate
    • 23. Fusion
    • 24. Materials
    • 25. Bioenergy
    • 26. Home of the 1st and 3rd fastest super computers in the world.
    The world’s most powerful system for open science
  • 27. Then Why Look at Cloud Computing???
    Science Takes Different Forms
    • Tight Simulations
    • 28. Data-Parallelized
    • 29. Embarrassingly Parallel
    Dearth of Mid-Range Assets
    • 256-1,000 cores
    • 30. 1 of many possible solutions
    Scaling Issues
    • Power Consumption
    • 31. Programming Struggles
    • 32. Fault-Tolerance
    Forward-Looking
    • Next-Generation Problems
    • 33. Next-Generation Researchers
  • Private
    (On-Premise)
    Infrastructure
    (as a Service)
    Platform
    (as a Service)
    Types of Clouds
    You manage
    Applications
    Applications
    Applications
    You manage
    Runtimes
    Runtimes
    Runtimes
    Security & Integration
    Security & Integration
    Security & Integration
    Managed by vendor
    Databases
    Databases
    Databases
    You manage
    Servers
    Servers
    Servers
    Managed by vendor
    Virtualization
    Virtualization
    Virtualization
    Server HW
    Server HW
    Server HW
    Storage
    Storage
    Storage
    Networking
    Networking
    Networking
  • 34. Private
    (On-Premise)
    Types of Clouds
    Infrastructure
    (as a Service)
    Platform
    (as a Service)
  • 35. Application Services
    “Dublin”
    “Velocity”
    Frameworks
    “Geneva”
    Security
    Access Control
    Project “Sydney”
    Connectivity
    Service Bus
    SQL Azure Data Sync
    Data
    Compute
    Windows Azure Platform
    Table Storage
    Blob Storage
    Queue
    Drive
    Content Delivery Network
    Storage
  • 36. Windows Azure Compute
    Development, service hosting, & management environment
    .NET, Java PHP, Python, Ruby, native code (C/C++, Win32, etc.)
    ASP.NET providers, FastCGI, memcached, MySQL, Tomcat
    Full-trust – supports standard languages and APIs
    Secure certificate store
    Management API’s, and logging and diagnostics systems
    Multiple roles – Web, Worker, Virtual Machine (VHD)
    Multiple VM sizes
    1.6 GHz CPU x64, 1.75GB RAM, 100Mbps network, 250GB volatile storage
    Small (1X), Medium (2X), Large (4X), X-Large (8X)
    In-place rolling upgrades, organized by upgrade domains
    Walk each upgrade domain one at a time
    Compute
  • 37. Windows Azure Diagnostics
    Configurable trace, performance counter, Windows event log, IIS log & file buffering
    Local data buffering quota management
    Query & modify from the cloud and from the desktop per role instance
    Transfer to storage scheduled & on-demand
    Filter by data type, verbosity & time range
    Compute
  • 38. Windows Azure Storage
    Rich data abstractions – tables, blobs, queues, drives, CDN
    Capacity (100TB), throughput (100MB/sec), transactions (1K req/sec)
    High accessibility
    Supports geo-location
    Language & platform agnostic REST APIs
    URL: http://<account>.<store>.core.windows.net
    Client libraries for .NET, Java, PHP, etc.
    High durability – data is replicated 3 times within a cluster, and (Feb 2010) across datacenters
    High scalability – data is automatically partitioned and load balanced across servers
    Storage
    Storage
  • 39. Windows Azure Table Storage
    Designed for structured data, not relational data
    Data definition is part of the application
    A Table is a set of Entities (records)
    An Entity is a set of Properties (fields)
    No fixed schema
    Each property is stored as a <name, typed value> pair
    Two entities within the same table can have different properties
    No schema is enforced
    Table Storage
  • 40. Windows Azure Blob Storage
    Storage for large, named files plus their metadata
    Block Blob
    Targeted at streaming workloads
    Each blob consists of a sequence of blocks
    Each block is identified by a Block ID
    Size limit 200GB per blob
    Page Blob
    Targeted at random read/write workloads
    Each blob consists of an array of pages
    Each page is identified by its offset from the start of the blob
    Size limit 1TB per blob
    Blob Storage
  • 41. Windows Azure Queue
    Performance efficient, highly available and provide reliable message delivery
    Asynchronous work dispatch
    Inter-role communication
    Polling based model; best-effort FIFO data structure
    Queue operations
    Create Queue
    Delete Queue
    List Queues
    Get/Set Queue Metadata
    Message operations
    Add Message
    Get Message(s)
    Peek Message(s)
    Delete Message
    Queue
  • 42. Windows Azure Drive
    Provides a durable NTFS volume for Windows Azure applications to use
    Use existing NTFS APIs to access a durable drive
    Durability and survival of data on application failover
    Enables migrating existing NTFS applications to the cloud
    Drives can be up to 1TB; a VM can dynamically mount up to 8 drives
    A Windows Azure Drive is a Page Blob
    Example, mount Page Blob as X:
    http://<account>.blob.core.windows.net/<container>/<blob>
    All writes to drive are made durable to the Page Blob
    Drive made durable through standard Page Blob replication
    Drive
  • 43. Windows Azure Content Delivery Network
    Provides high-bandwidth global blob content delivery
    18 locations globally (US, Europe, Asia, Australia and South America), and growing
    Blob service URL vs. CDN URL
    Blob URL: http://<account>.blob.core.windows.net/
    CDN URL: http://<guid>.vo.msecnd.net/
    Support for custom domain names
    Access details
    Blobs are cached in CDN until the TTL passes
    Use per-blob HTTP Cache-Control policy for TTL (new)
    CDN provides only anonymous HTTP access
    Content Delivery Network
  • 44. Tenants of Internet-Scale Application Architecture
    Design
    • Horizontal scaling
    • 45. Service-oriented composition
    • 46. Eventual consistency
    • 47. Fault tolerant (expect failures)
    Security
    • Claims-based authentication & access control
    • 48. Federated identity
    • 49. Data encryption & key mgmt.
    Management
    • Policy-driven automation
    • 50. Aware of application lifecycles
    • 51. Handle dynamic data schema and configuration changes
    Data & Content
    • De-normalization
    • 52. Logical partitioning
    • 53. Distributed in-memory cache
    • 54. Diverse data storage options (persistent & transient, relational & unstructured, text & binary, read & write, etc.)
    Processes
    • Loosely coupled components
    • 55. Parallel & distributed processing
    • 56. Asynchronous distributed communication
    • 57. Idempotent (handle duplicity)
    • 58. Isolation (separation of concerns)
  • Application Goals
    Simulate Post-Processing of Scientific Data
    • Generate Visualizations from “raw” data
    • 59. Transform data to be consumable by general processes
    • 60. Exercise various storage mechanisms
    Focus on Mechanics
    • The specific science problem being solved is secondary to the approach
    • 61. Goal is to refine approach such that it can fade allowing the science to regain preeminence
  • Putting Data Into the Cloud
    Source Data
    • NetCDF files – subset of US contribution to CMIP3 archive
    Visualization Support
    • Flatten Source Files to CSV
    • 62. Generate base “heat map”
    • 63. Combine heat map and base map
    • 64. Generate Video/Animation
    General Consumption/Publishing
    • Expose data as a “service” (REST/XML/JSON, etc.)
    • 65. Query-able
    • 66. Azure Tables (OGDI) / Azure Blob
  • Application Patterns
    Grid / Parallel Computing Application
    User
    Silverlight
    Application
    Web Browser
    Mobile
    Browser
    WPF
    Application
    ASP.NET
    (Web Role)
    Web Svc
    (Web Role)
    Jobs
    (Worker Role)
    ASP.NET
    (Web Role)
    ASP.NET
    (Web Role)
    ASP.NET
    (Web Role)
    ASP.NET
    (Web Role)
    ASP.NET
    (Web Role)
    ASP.NET
    (Web Role)
    Private Cloud
    Public Services
    ASP.NET
    (Web Role)
    ASP.NET
    (Web Role)
    ASP.NET
    (Web Role)
    ASP.NET
    (Web Role)
    ASP.NET
    (Web Role)
    ASP.NET
    (Web Role)
    Enterprise Application
    Application
    Service
    Enterprise Web Svc
    Data
    Service
    Table Storage
    Service
    Blob Storage
    Service
    Queue
    Service
    Enterprise Data
    Storage
    Service
    Identity
    Service
    Enterprise Identity
    Service Bus
    Access Control Service
    Workflow
    Service
    User
    Data
    Application Data
    Reference Data
  • 67. Flatten
    NetCDF
    Generate Image
    Table
    Loader
    Application Flow
    Message From Q
    Message From Q
    Message From Q
    Download Binary File
    Download CSV
    Download CSV
    For each Time Period…
    Generate Image
    Read In Rows
    Flatten to CSV (memory)
    Size Image
    For each Set of 100…
    Upload to Blob Storage
    Upload to Blob Storage
    Submit Batch To Table
    Queue Table Load Job
    Combine with Overlay
    Queue Gen Image Job
    Upload to Blob Storage
    Period in Lookup Table
  • 68. Context
    35 TB of numbers – How Much Data Is That?
    • A single lat/lon map at typical climate model resolution represents ~40 KB
    • 69. If you wanted to look at all 35 TB in the form of these lat/lon plots and if…
    • 70. Every 10 seconds you displayed another map
    • 71. You worked 24 hours/day, 365 days/year
    • 72. You could complete the task in about 200 years.
    Dataset Used
    • 1 NetCDF file, approximately 92 MB, located in blob storage
    • 73. 1,825 CSV files generated.
    • 74. 815.84 MB total
    • 75. Average file size is around 457.76 KB
    • 76. Each CSV represented 12,690 data points (lat/lon/temp)
    • 77. 3,650 images generated
    • 78. 145.03 MB total
    • 79. Heat Maps avg. 31.25 KB
    • 80. Combined images avg. 49 KB
    • 81. 23,652,000 entities added to azure table
  • Lessons
    Performance Counters
    • Take advantage of the new logging infrastructure within Azure to understand how your application is behaving.
    • 82. However, like food at the dinner table, only take what you can eat.
  • Flatten Operation – Proc utilization ~16% during active work
  • 83. Image Generation – Proc utilization ~95% during active work
  • 84. Table Load – Proc utilization ~57% during active work
  • 85. Table Load – Proc utilization ~57% during active work
  • 86. Lessons
    Performance Counters
    • Take advantage of the new logging infrastructure within Azure to understand how your application is behaving.
    • 87. However, like food at the dinner table, only take what you can eat.
    Tracing Infrastructure
    • Huge improvements from CTP to v1
    • 88. Use categories to filter / limit what you transfer out
    • 89. My eyes were bigger than my stomach
    Table Maintenance
    • (nodes * counters) + (nodes * trace) == lots of data
    • 90. Plan early for how you are going to maintain Wad* tables.
    • 91. Remember… redundancy/availability has a cost. (Perf)
  • Flatten: CSV Upload Time
    Over 40,349 attempts, 249.99 ms (79.12ms) with a rate of 15.63 mb/s (4.74).
    Avg File size: 457.76 KB
  • 92. Flatten: CSV Upload Rate
    Over 40,349 attempts, 249.99 ms (79.12ms) with a rate of 15.63 mb/s (4.74).
    Avg File size: 457.76 KB
  • 93. Flatten: Queue Insert Duration
    Over 40,345 attempts, given a msg size of 616b, insertion time averaged
    254.96 ms (68.86)
  • 94. Flatten: Single Table Entity Insert
    Over 40,353 attempts, average insertion time of 248.63 ms (108.16)
  • 95. ImageGen: CSV File Download Duration
    Over 40,349 attempts, 249.99 ms (79.12ms) with a rate of 15.63 mb/s (4.74).
    Avg File size: 457.76 KB
  • 96. ImageGen: CSV File Download Rate
    Over 40,349 attempts, 249.99 ms (79.12ms) with a rate of 15.63 mb/s (4.74).
    Avg File size: 457.76 KB
  • 97. ImageGen: Image Generation and Resizing
    Over 24,687 attempts, average generation time was 3.7s (0.283s)
  • 98. ImageGen: Image File Upload Duration
    Over 24,688 attempts, 88.14ms (44.84ms) with a rate of 3.02 mb/s (0.614).
    Avg File size: 32 KB
  • 99. ImageGen: Image File Upload Rate
    Over 24,688 attempts, 88.14ms (44.84ms) with a rate of 3.02 mb/s (0.614).
    Avg File size: 32 KB
  • 100. TableLoad: Batch Insert Rate
    Over 89,202 batches (100 records each), average duration was 1.447s (0.316s)
  • 101. Lessons
    Data
    • Generic formats tend to be large (92 MB NetCDF 816 MB CSV)
    • 102. Data transfer within Azure datacenter is fast (from your computer is slow)
    • 103. Think about transport overhead (ATOM/JSON/CSV/etc. – 9x larger)
    • 104. Use Asynccalls for data uploads/downloads (use your CPU cycles wisely – you are paying for them)
    Azure Tables
    • Inserts/Deletes are slow but relatively linear
    • 105. Partition keys are not queryable… store them
    • 106. Not well suited for “changing” data
    • 107. If you are using the client library/ADO.NET Data Services, be careful of how you handle async calls – you can lose context
    • 108. Use batch updates wherever possible (1 in 0.24863s or 100 in 1.447s) (6 individual updates take longer than 100 in a single batch.
  • Lessons
    General
    • Timeouts happen – Expect/Plan for them (exponential back-off & retry policies)
    • 109. Design for Idempotency
    • 110. Watch your compilation model (x86 vs. x64)
    • 111. Data transfer within Azure datacenter is fast (from your computer is slow)
    • 112. Don’t re-invent the wheel – use the available tools when practical
    • 113. Powershell, PowerPivot, Logparser, and the NET Charting Libraries are your friend.
  • Thank you
    gillenre@ornl.gov
    rob.gillenfamily.net
    @argodev
  • 114. The Microsoft Cloud
    Data Center Infrastructure
  • 115. The Microsoft Cloud
    Data Center Infrastructure
  • 116. The Microsoft Cloud
    ~100 Globally Distributed Data Centers
    Quincy, WA
    Chicago, IL
    San Antonio, TX
    Dublin, Ireland
    Generation 4 DCs