Utility HPC:
Right Systems, Right Scale,
Right Science
Jason Stowe, CEO
@jasonastowe, @cyclecomputing
I’m here to recruit you,
for a cause
We believe
utility access to compute power
makes impossible science,
possible.
Dynamic, utility access to
compute power
is as important as uptime
(that’s why coded infrastructure
is critical)
Skeptical?
Flickr:	
  Tourist	
  on	
  Earth	
  
In prior years (today?)
Researchers/engineers waited
for computing
For	
  the	
  
horsepower	
  
For	
  the	
  place	
  	
  
to	
  put	
  it	
  
For	
  it	
  to	
  be	
  	
  
Configured..	
  
Flickr: vaxomatic
Yesterday, high performance
engineering, science clusters
were…
Too small
when you need it most,
Too large
every other time.
The Innovation Bottleneck:
Researchers/Scientists/Engineers
Forced to size questions to the
infrastructure you have
 
Multi-­‐tenant	
  systems	
  create	
  float	
  capacity	
  
That	
  is	
  critical	
  to	
  innovation	
  	
  
The
60’s
The
70’s
The
80’s
The
90’s
The
00’s
From centralized to decentralized, collaborative to independent
and right back again!
The
10’s
Mainframes VAX	
   The	
  PC	
   Beowulf Clusters Central	
  Clouds	
  
100% 60% 0% 40% ??? %
SHARING	
  
~	
  0Mbit	
   ~ 1Mbit ~ 10Mbit ~	
  1000	
  Mbit	
   ~ 10,000 Mbit
Bigger, better but further and further away from the scientist’s lab
Ask a
Question Hypothesize Predict
Experiment /
Test Analyze Final Results
	
  	
  	
  	
  
The Scientific Method
Test and Analyze stages
require the most time,
compute, and data
Ask a
Question Hypothesize Predict
Experiment /
Test Analyze Final Results
	
  	
  	
  	
  
The Scientific Method
Any improvements to this
cycle yield multiplicative
benefits
A Challenge Across Industries
— 3 of Top 5 Insurance
— 6 of Top 8 Pharmaceutical
— 2 of Top 3 Banks
— 2 of Top 3 Genomics Sequencing
— 1 of Top 2 FPGA
Utility HPC in the News
WSJ, NYTimes, Wired, Bio-IT World BusinessWeek
To accelerate science, we need
automation
Management Software
CC1/CCG
Instances
EBS
S3
Shared
FS
EBS
Utility	
  HPC	
  Cluster	
  
-­‐ Scales	
  to	
  50,000+	
  cores	
  
-­‐ Data	
  Scheduling	
  
-­‐ Workload	
  portability	
  
Data &
Application
Aware
Movement
Traditional
Scheduler
Massive Scale
Based upon workload
Secure, HPC
Cluster
User
HPC
Reporting &
Audit
50,000-core CycleCloud
Using Chef and AWS
ChefConf 2012
10,600-instance cluster
against cancer target
ChefConf 2013
Created in 2 hours
Configured with Search,
with Data bags
one Chef 11 server
We make software tools to easily orchestrate complex
workloads and data access across Utility HPC
Today is a survey of use cases…
10,600 instance
Life Science
Molecular
Modeling
600 core
Manufacturing
Nuclear Power
Plant for safety
simulation
Genomic
Analysis
RNA for
Stem Cells
Dynamic, utility access to
compute power
is as important as uptime
Why?
#1: “Better” Science =
“Answer the question we want to
ask”, not constrained to what fits
on local compute power
#2 “Faster” Science =
Run this “better” science,
that would have taken
months or years
in hours or days
Survey of Use Cases
þ  Drug Design
þ  CAD/CAM
þ  Genomics
…
Life Sciences & Compute?
Compute
Data/Bandwidth
Genomics
Molecular
Modeling
CAD/
CAM
All Sample
Analysis
Proteomics
Biomarker/
Image Analysis
Sensor Data Import
Creating fake
Charts, with
Fake Data
Why is this important?
(W.H.O./Globocan 2008)
~2 million Type 2 diabetics,
~200k Type 1
Every day is
crucial and costly
Before:
Trade-off compute time vs.
accuracy
Now:
Accurate analysis, fewer false
negatives, faster
Initial
Coarse
Screen
Higher
Quality
Analysis
Best
Quality
Process for Drug Design
Higher
Quality
Analysis
Best
Quality
Big 10 Pharma
Built 10,600 instance cluster
($44M) in 2 hours, ran
40 years of science
in 11 hours for $4,372
Most Recent Utility Supercomputer
server count:
AWS Console view:
Cycle’s view of this cluster:
One Chef 11 Server
Earlier Drug Design
Novartis discussed at BioIT2012
— Needed
—  Push-button Utility Supercomputer for molecular
modeling
— Created
—  30,000 core run across US/EU Cloud (AWS)
—  10 years of compute in 8 hours for $10,000
—  Found 3 compounds now in the wetlab as a result
—  Capacity is no longer an issue
—  Hardware = software
—  Testing (error handling, unit testing, etc.)
e.g. Cycle spent ~$1M dollars on AWS over 5 years
—  The only way to do this is to automate
Lessons learned
 
Servers	
  are	
  not	
  	
  
house	
  plants	
  
	
  
 
Servers	
  are	
  wheat	
  
	
  
Survey of Use Cases
þ  Drug Design
þ  CAD/CAM
þ  Genomics
…
Nuclear Power Plant simulation
We don’t’ know what they’re
running, but it has “Safety”
600-core CAD/CAM
3 Quarters of a year wait became 3 weeks
Site
Data
Corporate
Firewall
3 Weeks instead
Of 3 Quarters
Secure
HPC
Cluster
TBs FS
External Cloud	
  
~600 CPU cluster
Scheduled
Data
Engineer
Survey of Use Cases
þ  Drug Design
þ  CAD/CAM
þ  Genomics
…
Gene Expression Analysis
Morgridge Institute for Research
Run holistic comparison of all 78 terabyte stem cell
RNA samples to build a unique gene expression
database
Make it easier to replicate disease in petri dishes w/
induced stem cells
78 TB of Stem Cell RNA
1 Million compute hours,
115 years of computing in
1 week for $19,555
Gene Expression Analysis
Morgridge Institute for Research
— Cluster details
—  5,000 to 10,000 cores for a week
—  Very long individual analysis were check-pointed =
Spot instance usage possible
Survey of Use Cases
þ  Drug Design
þ  CAD/CAM
þ  Genomics
…
Code can accelerate Science
Ask a
Question Hypothesize Predict
Experiment /
Test Analyze Final Results
	
  	
  	
  	
  
The Scientific Method on Utility HPC
Yield “Better”, “Faster”
Research for less $
Dynamic, utility access to
compute power
is as important as uptime
I’m here to recruit you,
for a cause
Contribute to Chef.
Make the community better.
And you will help Cycle
make impossible science,
possible.
2013 BigScience Challenge
$10,000 of free computing to science
benefitting humanity
2012 winner: 115yr Genomic analysis
Enter at:
http://cyclecomputing.com/big-science-challenge/enter
Thank You! Questions?

Utility HPC: Right Systems, Right Scale, Right Science