Your SlideShare is downloading. ×
0
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Azure Sample for Climate Analysis
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Azure Sample for Climate Analysis

1,152

Published on

Presentation I gave at the Microsoft Public Sector/Healthcare & Life Sciences Dinner and Cloud Computing Showcase held during PDC.

Presentation I gave at the Microsoft Public Sector/Healthcare & Life Sciences Dinner and Cloud Computing Showcase held during PDC.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,152
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • For updates to this content please download the latest Azure Services Platform Training Kit from: http://www.azure.com
  • Transcript

    • 1. Large Scale Scientific Data
      Notes From The Field
      Rob Gillen
      Computer Science Research
      Oak Ridge National Laboratory
      Planet Technologies, Inc.
    • 2. ORNL is DOE’s largest scienceand energy laboratory
      • World’s most powerful open scientific computing facility
      • 3. Nation’s largest concentrationof open source materials research
      • 4. $1.3B budget
      • 5. 4,350 employees
      • 6. 3,900 researchguests annually
      • 7. $350 million investedin modernization
      • 8. Nation’s most diverse energy portfolio
      • 9. Operating the world’s most intense pulsed neutron source
      • 10. Managing the billion-dollar U.S. ITER project
    • Leading the developmentof ultrascale scientific computing
      • Leadership Computing Facility:
      • 11. World’s most powerful open scientific computing facility
      • 12. Jaguar XT operating at >1.64 petaflops
      • 13. Exascale system by the end of the next decade
      • 14. Focus on computationally intensive projects of large scale and high scientific impact
      • 15. Just upgraded to ~225,000 cores
      • 16. Addressing key science and technology issues
      • 17. Climate
      • 18. Fusion
      • 19. Materials
      • 20. Bioenergy
      3 Managed by UT-Battellefor the Department of Energy
    • 21. Initial Context
      Studying the intersection of HPC/scientific computing and the cloud
      Data locality is a key issue for us
      Cloud computing looks to fill a niche in pre- and post-processing as well as generalized mid-range compute
      This project is an introductory or preparatory step into the larger research project
    • 22. Sample Application Goals
      Make CMIP3 data more accessible/consumable
      Prototype the use of cloud computing for post-processing of scientific data
      Answer the questions:
      Can cloud computing be used effectively for large-scale data
      How accessible is the programming paradigm
      Note: focus is on the mechanics, not the science (could be using number of foobars in the world rather than temp simulations)
    • 23. Technologies Utilized
      Windows Azure (tables, blobs, queues, web roles, worker roles)
      OGDI (http://ogdisdk.cloudapp.net/)
      C#, F#, PowerShell, DirectX, SilverLight, WPF, Bing Maps (Virtual Earth), GDI+, ADO.NET Data Services
    • 24. Two-Part Problem
      Get the data into the cloud/exposed in such a way as to be consumable by generic clients in Internet-friendly formats
      Provide some sort of visualization or sample application to provide context/meaning to the data.
    • 25. Context: 35 Terabytes of numbers - How much data is that?
      A single latitude/longitude map at typical climate model resolution represents about ~40 KB.
      If you wanted to look at all 35 TB in the form of these latitude/longitude plots and if..
      Every 10 seconds you displayed another map and if
      You worked 24 hours a day 365 days each year,
      You could complete the task in about 200 years.
    • 26. Dataset Used
      5 GB worth of NetCDF files
      Contributing Sources
      NOAA Geophysical Fluid Dynamics Laboratory, CM2.0 Model
      NASA Goddard Institute for Space Studies, C4x3
      NCAR Parallel Climate Model (Version 1)
      Climate of the 20th Century Experiment, run 1, daily
      Surface Air Temperature (tas)
      Maximum Surface Air Temperature (tasmax)
      Minimum Surface Air Temperature (tasmin)
      > 1.1 billion unique values (lat/lon/temp pairs)
      0.014 % of total set
    • 27. Application Workflow
      Source file are uploaded to blob storage
      Each source file is split into 1000’s of CSV files stored in blob storage
      Process generates a Load Table command for each CSV created
      Load Table workers process jobs and load CSV data into Azure Tables.
      Once a CSV file has been processed, a Create Image job is created
    • 28. Application Workflow
      Create Image workers process queue, generating a heat map image for each time set
      Once all data is loaded and images are created, a video is rendered based on the resulting images and used for inclusion in visualization applications.
    • 29. Current Data Loaded
      > 1.1 billion table entries (lat/lon/value)
      > 250,000 blobs
      > 75 GB (only blob)
    • 30. Data Load Review
      Results for first subset
      Averaged 2:30/time period
      40,149 time periods
      24 per worker hour
      1,672.8 worker-hours
      14 active workers
      119.5 calendar hours
      328,900,608 total entities
      Near-linear scale out
      This represents 0.003428 % of total set
    • 31. WPF Data Visualization Application
      Demo

    ×