• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Azure Sample for Climate Analysis
 

Azure Sample for Climate Analysis

on

  • 1,760 views

Presentation I gave at the Microsoft Public Sector/Healthcare & Life Sciences Dinner and Cloud Computing Showcase held during PDC.

Presentation I gave at the Microsoft Public Sector/Healthcare & Life Sciences Dinner and Cloud Computing Showcase held during PDC.

Statistics

Views

Total Views
1,760
Views on SlideShare
1,751
Embed Views
9

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 9

http://www.slideshare.net 9

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • For updates to this content please download the latest Azure Services Platform Training Kit from: http://www.azure.com

Azure Sample for Climate Analysis Azure Sample for Climate Analysis Presentation Transcript

  • Large Scale Scientific Data
    Notes From The Field
    Rob Gillen
    Computer Science Research
    Oak Ridge National Laboratory
    Planet Technologies, Inc.
  • ORNL is DOE’s largest scienceand energy laboratory
    • World’s most powerful open scientific computing facility
    • Nation’s largest concentrationof open source materials research
    • $1.3B budget
    • 4,350 employees
    • 3,900 researchguests annually
    • $350 million investedin modernization
    • Nation’s most diverse energy portfolio
    • Operating the world’s most intense pulsed neutron source
    • Managing the billion-dollar U.S. ITER project
  • Leading the developmentof ultrascale scientific computing
    • Leadership Computing Facility:
    • World’s most powerful open scientific computing facility
    • Jaguar XT operating at >1.64 petaflops
    • Exascale system by the end of the next decade
    • Focus on computationally intensive projects of large scale and high scientific impact
    • Just upgraded to ~225,000 cores
    • Addressing key science and technology issues
    • Climate
    • Fusion
    • Materials
    • Bioenergy
    3 Managed by UT-Battellefor the Department of Energy
  • Initial Context
    Studying the intersection of HPC/scientific computing and the cloud
    Data locality is a key issue for us
    Cloud computing looks to fill a niche in pre- and post-processing as well as generalized mid-range compute
    This project is an introductory or preparatory step into the larger research project
  • Sample Application Goals
    Make CMIP3 data more accessible/consumable
    Prototype the use of cloud computing for post-processing of scientific data
    Answer the questions:
    Can cloud computing be used effectively for large-scale data
    How accessible is the programming paradigm
    Note: focus is on the mechanics, not the science (could be using number of foobars in the world rather than temp simulations)
  • Technologies Utilized
    Windows Azure (tables, blobs, queues, web roles, worker roles)
    OGDI (http://ogdisdk.cloudapp.net/)
    C#, F#, PowerShell, DirectX, SilverLight, WPF, Bing Maps (Virtual Earth), GDI+, ADO.NET Data Services
  • Two-Part Problem
    Get the data into the cloud/exposed in such a way as to be consumable by generic clients in Internet-friendly formats
    Provide some sort of visualization or sample application to provide context/meaning to the data.
  • Context: 35 Terabytes of numbers - How much data is that?
    A single latitude/longitude map at typical climate model resolution represents about ~40 KB.
    If you wanted to look at all 35 TB in the form of these latitude/longitude plots and if..
    Every 10 seconds you displayed another map and if
    You worked 24 hours a day 365 days each year,
    You could complete the task in about 200 years.
  • Dataset Used
    5 GB worth of NetCDF files
    Contributing Sources
    NOAA Geophysical Fluid Dynamics Laboratory, CM2.0 Model
    NASA Goddard Institute for Space Studies, C4x3
    NCAR Parallel Climate Model (Version 1)
    Climate of the 20th Century Experiment, run 1, daily
    Surface Air Temperature (tas)
    Maximum Surface Air Temperature (tasmax)
    Minimum Surface Air Temperature (tasmin)
    > 1.1 billion unique values (lat/lon/temp pairs)
    0.014 % of total set
  • Application Workflow
    Source file are uploaded to blob storage
    Each source file is split into 1000’s of CSV files stored in blob storage
    Process generates a Load Table command for each CSV created
    Load Table workers process jobs and load CSV data into Azure Tables.
    Once a CSV file has been processed, a Create Image job is created
  • Application Workflow
    Create Image workers process queue, generating a heat map image for each time set
    Once all data is loaded and images are created, a video is rendered based on the resulting images and used for inclusion in visualization applications.
  • Current Data Loaded
    > 1.1 billion table entries (lat/lon/value)
    > 250,000 blobs
    > 75 GB (only blob)
  • Data Load Review
    Results for first subset
    Averaged 2:30/time period
    40,149 time periods
    24 per worker hour
    1,672.8 worker-hours
    14 active workers
    119.5 calendar hours
    328,900,608 total entities
    Near-linear scale out
    This represents 0.003428 % of total set
  • WPF Data Visualization Application
    Demo