Successfully reported this slideshow.
Your SlideShare is downloading. ×

NASA Goddard: Head in the Clouds

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 23 Ad

More Related Content

Slideshows for you (20)

Similar to NASA Goddard: Head in the Clouds (20)

Advertisement

More from Amazon Web Services (20)

Recently uploaded (20)

Advertisement

NASA Goddard: Head in the Clouds

  1. 1. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 NASA Goddard: Head in the Clouds Dan Duffy, NASA Steve Orrin, Intel Tim Carroll, Cycle Computing ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  2. 2. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Fastest growing workloads Fraud Detection Risk Modeling Drug Design Genomics Modeling and Simulation Unstructured Data Analysis, Data Lakes
  3. 3. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Most resource intensive 1 core 8 cores 8 servers 10–10000 servers
  4. 4. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Great, so… what’s the problem?
  5. 5. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 The challenge of fixed capacity Time Capability Internal Capacity System Organization
  6. 6. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Transform/life sciences The problem in 2013: • Cancer research needed 50,000 cores, not available in-house The options they didn’t choose: • Buy infrastructure: Spend $2M, wait 6 months • Write software for 9–12 months this 1 app Solution: • Created 10,600 server cluster • 39.5 years of computing in 8 hours • Found 3 potential drug candidates! • Total infrastructure bill: $4,372 6
  7. 7. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Cycle powers cloud BigData and BigCompute Data Workflow Cloud Orchestration Analytics Modeling Internal Compute Compute Burst Software required to drive analytics and simulation at scale: • Easy access • Highly automated • On-demand • Ask the right questions
  8. 8. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Best way to try it… try it 8 to try it…try Tim@cyclecomputing.com
  9. 9. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 20159 Measure Woody Biomass on South Side of the Sahara at the 40–50 cm Scale Using AWS Overview of the NASA Head in the Clouds Project presented at the Amazon Web Services Public Summit 2015 Daniel Duffy daniel.q.duffy@nasa.gov and on Twitter @dqduffy High Performance Computing Lead at the NASA Center for Climate Simulation (NCCS) – http://www.nccs.nasa.gov and @NASA_NCCS Goddard Space Flight Center (GSFC) – http://www.nasa.gov/centers/goddard/home/
  10. 10. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 ESD Project Won Intel Head in Clouds Challenge Award to Estimate Biomass in South Sahara Project Goal • Using NGA data to estimate tree and bush biomass over the entire arid and semi-arid zone on the south side of the Sahara Project Summary • Estimate carbon stored in trees and bushes in arid and semi- arid south Sahara • Establish carbon baseline for later research on expected CO2 uptake on the south side of the Sahara Principal Investigators • Dr. Compton J. Tucker, NASA Goddard Space Flight Center • Dr. Paul Morin, University of Minnesota Tree Crown Shadow NGA 40 cm imagery representing tree and shrub automated recognition
  11. 11. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Intel • Professional Services and Funding for AWS Resources Amazon Web Services (AWS) • Compute and storage • Support to set up environment Cycle Computing • Cloud Resource Management Software • Services to install and configure the software Climate Model Data Services (CDS – GSFC Code 600) • NGA data support NASA Center for Climate Simulation (NCCS – GSFC Code 606.2) • System administration, application support, and data movement NASA CIO • General cloud consulting and coordination support Partners and Resources
  12. 12. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Existing Sub-Saharan Arid and Semi-Arid Sub-Meter Commercial Imagery 9600 Strips (~80TB) to Be Delivered to GSFC ~1600 strips (~20TB) at GSFC Area Of Interest (AOI) for Sub-Saharan Arid and Semi-Arid Africa
  13. 13. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 The DigtalGlobe Constellation The Entire Archive is Licensed to the USG Geoeye Quickbird Ikonos Worldview 1 Worldview 2 Worldview 3 (Available Q1 2015)
  14. 14. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 14 Panchromatic and multispectral mapping at the 40- and 50-cm scale
  15. 15. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Use Niger as the test case NGA data over Niger • Currently have about 16,000 total scenes covering Niger (the data is already orthorectified) • For this test case, approximately 3,120 scenes need to be processed to generate the vegetation index • Each scene is approximately 30,000 x 30,000 data points (pixels) • Will break each scene up into 100 tiles (3,000 x 3,000) Where is the data? • Data currently resides within the NCCS and in AWS Additional data • If we are successful and have additional time and resources, other African areas can be studied. 15NASA Head in the Clouds Project
  16. 16. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Processing requirements Based on the tests run in the NCCS private cloud, following processing requirements were estimated • The tests were run on a single core (Intel E5-2670 2.5 GHz processor) virtual machine with 2 GB of memory • Each of the 3,120 scenes is broken up into 100 tiles • Each tile took 24 minutes • Hence, one scene will then take 24 * 100 = 2,400 minutes of total processor time (about 40 wall clock hours) • Tiles and scenes can be run in parallel • Total scene to process = 312,000 • Total compute hours = 124,800 Target completion time • 1 month will take between 175 to 200 virtual machines running non-stop 16NASA Head in the Clouds Project
  17. 17. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Input and output data Input data • Total input of about 8 TB for the 3,120 scenes • Average of about 2.63 GB of data per scene • Average of about 26.3 MB of data per tile Intermediate data products • Unsure of how much intermediate data products are needed; this will impact the amount of temporary space required for each run Output data products • Total output data is estimated to be 25% of the input data • Estimated total output is about 2 to 3 TB • Output data will be transferred back to the NCCS 17NASA Head in the Clouds Project
  18. 18. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Cluster configuration requirements 18 Category Description Requirement Number of Cores How many cores are required on a single node for the application? 1 per tile Amount of Memory (RAM) How much memory on a node (or per core) is required for the application? 2 GB per tile Operating System (OS) What operating system does the application need? Linux (Centos or debian) Libraries/Tools/Software What additional libraries, tools, and software are needed to be installed? Compilers? Commercial software? None Parallelization Can the application run in a parallel manner? If so, how (threaded, MPI, or multiple instances of the application)? Inherently parallel processing of each scene and/or tile Cluster If the application runs in parallel across many nodes, how many nodes are required? 175 – 200 to complete in 1 month; more can be used Storage How much storage space will be required for each run (input, intermediate, and output files)? Total Input – 8 TB (approx. 2.6 GB for each scene) Intermediate – To be determined Total Output Back to NCCS – 2 TB ( approx. 25% of total input) Shared Storage Does this storage have to be shared across all nodes? No
  19. 19. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Workflow 19 DataMan Cycle Computing Data Transfer Software NCCS Science Cloud (Internal Cloud) Shared File System NGA Data at NASA NGA Data External to NASA (PGC, Digital Globe) Data to be copied into the NCCS science cloud NGA data repository. NCCS/NASA VM Local Data A resource manager (batch queue) will be running in AWS. Scientists will interact and launch jobs through the Cycle Computing system directly in AWS. Virtual machines will be launched in AWS. After the job is completed, the results will be copied back to the NCCS. VM Local Data VM Local Data VM Local Data AWS VM VM VM Virtual machines in the internal cloud can read the data directly from the shared disk in the NASA internal cloud. No additional data movement is required. Amazon S3 Data to be processed is staged into Amazon S3. Data will be moved to the local storage of the VM’s for processing. Products could be stored in S3 for transfer to the NCCS at a later time. Batch Queue System The Cycle Computing DataMan software will be used to transfer the data into Amazon S3. Cycle Computing System
  20. 20. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Time line 20 Category Dec Jan Feb Mar Apr May Jun Jul Aug Sep Bi-Weekly Tag Ups Requirements/Scope Setup/Configuration Test Runs Transfer Data to S3 Configure S3 Buckets Production Runs Analysis Final Report
  21. 21. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Why use Cycle Computing and AWS? • The bigger goal is to analyze the entire arid and semi-arid zone on the south side of the Sahara – About 80 TB – 10x the data that the initial project will analyze • On 200 virtual machines, this will take 10 months! – How can we accelerate this? • Can easily scale up the number of virtual machines using the Cycle Computing software and the AWS resources – Once the data is in AWS, 80 TB of data can be analyzed in approximately the same amount of time as 8 TB of data – Scientists really love this part! • Might need longer given the data transfers may take time – can overlap data transfers and computation 21
  22. 22. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Thanks goes to the following… NASA • Dr. Compton Tucker (Co-PI) • Katherine Melocik (GSFC) • Jennifer Small (GSFC) • Dr. Tsengdar Lee (HQ) • Daniel Duffy (GSFC) • Mark McInerney (GSFC) • Hoot Thompson (GSFC) • Garrison Vaughn (GSFC) • Brittany Wills (GSFC) • Scott Sinno (GSFC) • Ray Obrien (ARC) • Richard Schroeder (ARC) • Milton Checchi (ARC) University Partners • Paul Morin (Co-PI, Univ. Minnesota) • Claire Porter (Univ. Minnesota) • Jamon Van Den Hoek (Oak Ridge) 22 Cycle Computing • Tim Carroll • Michael Requa • Carl Chesal • Bob Nordlund • Glen Otero • Rob Futrick AWS • Jamie Baker • Jeff Layton There are others… My apologies for those I missed. These are typically the ones on the our conference calls!
  23. 23. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Thank You. This presentation will be loaded to SlideShare the week following the Symposium. http://www.slideshare.net/AmazonWebServices AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Editor's Notes

  • Key Points:
    Multi-billion dollar corps committed to getting better answers faster
    (key on the hook)

×