What does risk modeling and analytics in financial services have in common with large scale computing in high energy physics? Come to this session to hear how financial services customers like Aon are taking advantage of new approaches like predictive analytics and AI/deep learning on AWS to perform risk modeling and how Brookhaven National Laboratory are using 10s of thousands of cores to do large scale grid computing for Monte Carlo simulations in high energy physics. In addition, we will also showcase how CSIRO eHealth team in Australia are innovating with serverless architectures using AWS Lambda for personalized medicine and genomics.
Speakers: Adrian White, Sr SciCo Technical Manager, Amazon Web Services
2. “… Programmers tend to set the size of problems
to fully exploit the computing power that becomes
available as the resources improve.”
Gustafson's law
https://en.wikipedia.org/wiki/Gustafson's_law
9. “Invention requires two things: the ability to try a
lot of experiments, and not having to live with
the collateral damage of failed experiments.”
– Andy Jassy, Amazon Web Services
10. Let’s Move This Model To AWS
Head
node
Amazon
Direct Connect
Remote
visualisation
S3
RDS
EFS
AWS Snowball
Auto scaling
compute grid
11. Scale Grids Based On Work To Be Done
Head
node
S3
RDS
EFS
Amazon
Direct Connect
Remote
visualisation
AWS Snowball
12. Add Entire Compute Grids On-demand
Head
node
Head
node
S3
RDS
EFS
Amazon
Direct Connect
Remote
visualisation
AWS Snowball
13. On-demand, Auto Scaling Clusters On AWS
CfnCluster AWS Batch
AWS Batch
automatically provisions
compute resources
tailored to the needs of
your jobs using Amazon
EC2 and EC2 Spot
Alces Flight is
available in the AWS
Marketplace and
bundles 1000+
commonly used
applications
https://aws.amazon.com
/marketplace/
CfnCluster is provided
by AWS to quickly
provision configurable
clusters and grid
computing
environments.
18. An Aside: Monte Carlo Methods
Monte Carlo methods vary, but tend to follow
a particular pattern:
1. Define a domain of possible inputs
2. Generate inputs randomly from a probability
distribution over the domain
3. Perform a deterministic computation
on the inputs.
4. Aggregate the results
Monte Carlo simulations require many, many parallel iterations…
19. Scale with AWS: Brookhaven Labs
• More than 500 million events were fully
simulated using Monte Carlo methods in 10
days using 2.9 million jobs
• This would have taken 6 weeks on-premises
without AWS
• Used multiple AWS regions to minimise storage
costs and improve latency for data access
• The HEP Cloud project added 58,000 vCPUs
elastically to their on-premises facility for the
CMS experiment
20. Compute Strategies at Scale
Reserved capacity
On-demand capacity
Spot capacity Blend commercial models.
1. Reserve your baseline capacity
2. Supplement with on-demand
3. Aggressively use Spot capacity
for suitable workloads.
21. Spot Exists in Different Markets
$0.27 $0.29$0.50
1b 1c1a
8XL
$0.30 $0.16$0.214XL
$0.07 $0.08$0.082XL
$0.05 $0.04$0.04XL
$0.01 $0.04$0.01L
C3
$1.76
On
Demand
$0.88
$0.44
$.22
$0.11
Each instance family
Each instance size
Each Availability Zone
In every region
22. Spot Strategies at Scale
1. Let Spot Fleet take care of the detail
• Lowest price vs diversified allocation
• Spot fleet instance weighting
2. Make the cluster or workload manager “Spot aware”
• HTCondor & Condor Annex in HEP
• Toil workflow engine in Genomics & Life Sciences
3. Use services like Amazon ECS and AWS Batch to
manage the details
23. Use AWS Batch to Manage Spot
Queue of
runnable jobs
AWS Batch
ECS / Spot
cluster
CloudWatch alarms
trigger auto scaling of
the ECS cluster
Custom CloudWatch
Metric monitoring
queue length
S3 events trigger
a lambda function
to add job to queue
Source data
Lands in S3
Output data
Products written
to S3
Additional analytics
e.g. Spark, Tez,
Hive, MLlib
24. “No server is easier to
manage than ‘no server’.”
– Werner Vogels, Amazon.com
25. AWS Lambda – How it Works
Bring your own code
Node.JS, Java, Python
Java = Any JVM based
language such as Scala,
Clojure, etc.
Bring your own libraries
Flexible invocation paths
Event or RequestResponse
invoke options
Existing integrations with
various AWS services
Simple resource model
• Select memory from 128MB
to 1.5GB in 64MB steps
• CPU & Network allocated
proportionately to RAM
• Reports actual usage
Fine grained permissions
• Uses IAM role for Lambda
execution permissions
• Uses Resource policy for
AWS event sources
26. CSIRO – Cloud-based CRISPR Prediction GT-Scan2
CRISPR/Cas9 technology provides genome editing capability.
This has application for personalised medicine and agriculture.
CSIRO built GT-Scan2 to:
1. Better understand the science
2. Provide higher powered computational tools
• Super-computing-scale analysis
• Interactive real time analysis (query style research)
GT-Scan2
Ranked choices
27. CSIRO – CRISPR Search with AWS Lambda
GT-Scan2.0 is implemented as a
microservices architecture using
AWS Lambda
Serverless:
• Does not require users to
have high-compute power
Scalable:
• Can be easily scaled to
whole genome analysis
Also implement as a “stand-alone”
• Can be run on local servers
• Can incorporate your own
ChIP-seq data rather than
public data
28. Lambda in the Context of Grid Computing
Source: “Occupy the Cloud: Distributed Computing for the 99%”
https://arxiv.org/pdf/1702.04024.pdf
29. Cost Considerations for Lambda at Scale?
www.cloudhealthtech.com/blog/how-use-lambda-ec2-save-most-money
Millions of function executions
Cost per million
executions
$0
$20
0
$14
$4
1 3 6 10 40
Lambda
On-demand
3 year RI
26
32. GPUs for Risk Modelling & Hedging
The Challenge
Spinning up up large numbers of GPUs quickly and
inexpensively to meet ABSI’s customers financial
modelling & reporting needs
ABSI uses proprietary algorithms (Monte Carlo
simulations) running millions of times
The solution
ABSI moved its infrastructure to AWS and deprecated its
co-located data centre
ABSI built a front-end on AWS for its processing solution,
automatically running GPU instances on Amazon EC2
using EBS in an Amazon VPC for security.
The Result
Can be as much as 500 times more efficient in terms of
performance per dollar for some clients
“Using AWS helps us reduce a 10-
day process to 10 minutes. That’s
transformative: it broadens our ability
to discover.”
Peter Phillips
Managing Director, Aon Benfield Securities
UK-based Aon plc, the ultimate parent company of Aon
Benfield Securities, is a leading global provider of risk
management, insurance and reinsurance brokerage
33. “… Programmers tend to set the size of problems
to fully exploit the computing power that becomes
available as the resources improve.”
Gustafson's Law
https://en.wikipedia.org/wiki/Gustafson's_law
35. Further Information
AWS resources:
• High Performance Computing
• Financial Services Grid Computing on Amazon Web Services
• Research & Technical Computing on AWS
Blog posts and articles:
• Genome Engineering Applications: Early Adopters of the Cloud
• Experiment that Discovered the Higgs Boson Uses AWS to Probe Nature