In this slidecast, Jason Stowe from Cycle Computing describes how the company enabled HGST to spin up a 70,000-core cluster from AWS and then returned it 8 hours later.
Watch the video presentation: http://wp.me/p3RLHQ-dAG
3. HGST, A Western Digital Company
Transforming design of drives that hold the world’s data
• The Science
– The Problem: 30 days to
finish the run in-house,
stopping other work
– Engineering advanced
drive heads by doing
1 Million simulations of
potential designs
– 1 Million simulations =
Sweep of 22 design
parameters on three
different media types.
• The Business
“At every step, we are
innovating with purpose
and pace to exceed the
expectations of our
customers”
– Mike Cordano, President
4. “Gojira” Run – Facts and Figures
World’s Largest Fortune 500 cloud run
Metric Count
Compute Hours of Work 619,748 hours
Compute Years of Work 70.75 years
Design Count ~1 Million drive head designs
Run Time 8 Hours, not 30 days in-house
Application Used MRM/MatLab, CycleCloud, Chef
Max Scale (cores) 70,908 AWS cores, 3 regions
Max Scale (instances) 5,689 Spot Instances at peak
Computing power 729 TeraFLOPS rPeak, more than
#63 on Top 500’s rPeak
Infrastructure costs AWS Spot Instances: $5,594
5. The value of Timing
Technical computing is the New Enterprise Workload
Technology Timing Significance
Told about this workload on
Wednesday, ran by the weekend
Our software and cloud enable fast turn
around work, at scale
0 to 50,000 cores in 23 minutes
Can tackle problems at a scale that is 100x
bigger than in-house, in minutes
8 hours, instead of 30 days 90x throughput, faster business result
729 TeraFLOPs cluster in 60 minutes AWS Spot enabled access for $5,593.94
All IvyBridge processors Moore’s Law helps HGST
6. What’s different about this run?
• New Enterprise Scale: World’s Largest Cloud run for an F500,
R&D now asks the right question, there are no scale limits
• New Industry: Leader in Manufacturing, reflects broad enterprise
adoption of Cloud Cluster Computing
• New Agility: CycleCloud acquired and vetted 50,000 cores in 23
minutes, and controlled all regions from one instance of the software
• New Processor: Had 50% more FLOPS per Ivy Bridge core than the
year-ago total from the MegaRun