CPAC Connectome Analysis in the Cloud

Harnessing cloud computing
for high capacity analysis of
neuroimaging data
Cameron Craddock, PhD
Computational Neuroimaging Lab
Center for Biomedical Imaging and Neuromodulation
Nathan S. Kline Institute for Psychiatric Research
Center for the Developing Brain
Child Mind Institute

Discovery science in Psychiatric Neuroimaging
1. Characterizing inter-individual variation in connectomes (Kelly et al.
2012)
2. Identifying biomarkers of disease state, severity, and prognosis
(Craddock 2009)
3. Re-defining mental health in terms of neurophenotypes, e.g. RDOC
(Castellanos 2013)
Data is often shared only in its raw form – must be preprocessed to remove
nuisance variation and to be made comparable across individuals and sites.

Configurable Pipeline for the Analysis of
Connectomes (CPAC)
• Pipeline to automate preprocessing and analysis
of large-scale datasets
• Most cutting edge functional connectivity
preprocessing and analysis algorithms
• Configurable to enable “plurality” – evaluate
different processing parameters and strategies
• Automatically identifies and takes advantage of
parallelism on multi-threaded, multi-core, and
cluster architectures
• “Warm restarts” – only re-compute what has
changed
• Open science – open source
• http://fcp-indi.github.io
Nypipe

Computing in the Amazon Cloud
• No hardware capital cost
• No hardware maintenance
• No software installation or
configuration*
• Resources scale to meet
need for no overhead
• Available everywhere and
to everybody
• Allows access to exotic
architectures, such as GPUs
*If appropriate AMI is available

Amazon EC2 - Instance
• The hardware on which your processing will
run:

Instance Pricing
• On-demand Pricing
– Always available, fixed
price, non-interruptible,
most stable
• Spot instances
– Market to sell otherwise
unused time, variable
price, interruptible

Spot Instances
• Prices fluctuate over
time
• If price exceeds the max
you are willing to pay,
your instances are
terminated

Storage
• S3 – Simple Storage Service
– Secure and stable storage with a web service interface, pay for what you use
– Big and slow, $0.03 per GB/Month
– Can be accessed from anywhere
• EBS – Elastic Block Storage
– Provisioned storage (SSD HD) directly connected to instance, pay for what you provision
– Fast and expensive, $0.10 per GB/Month
– Persistent and transferrable
• Instance Storage
– SSD storage provided with some instances, included in instance price
– Fast and free
– Non-persistent and non-transferrable – good for cache

Data Transfer
• In general, free in - pay out
– Out to other Amazon service such as S3, EBS, etc
is free
– Out to Internet is $0.09 per GB (becomes slightly
cheaper after 10TB or so)

Amazon Machine Images
• Virtual machines that provide the software
environment for your processing
• You can build your own, or use one
maintained by others

StarCluster
• Star cluster simplifies the process of building a
Sun Grid Engine based cluster in EC2
– Dynamically add and remove compute nodes
– Uses spot instances
– Provides scripts for performing many
administrative tasks

C-PAC Amazon Machine Image
Nypipe

Proof of concept
• Preprocessed 1,112 datasets from
ABIDE with C-PAC
– 4 different preprocessing strategies
(+/- temporal filter, +/- global signal
regression)
– 24 derivatives:
• ReHo, ALFF, fALFF, 10 RSNs, VMHC, binary
degree centrality, weighted degree
centrality, lFCD, time courses for 5 atlases
(AAL, TT, EZ, HO, CC200, CC400)
http://preprocessed-connectomes-project.github.io/abide

• Requires 45 minute to process 1 dataset
• 3 datasets can be processed in parallel
• Processing results in .5GB of data
Model Parameters

Cloud vs. Traditional Computing
0
5000
10000
15000
100
2000
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
Number of Datasets
Cost($)
Instance Cost Storage Cost Transfer Cost
0
4000
8000
12000
100
2000
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
Number of Datasets
Time(hours)
No Download Total Processing Time

Impact of Spot Instances
Simulations using past 90 days of spot price history

What about HIPAA?
• Amazon AWS meets FedRAMP and NIST 800-53
standards, which are more rigorous than HIPAA
– Access to instances controlled using 256-bit AES
– Default firewalls deny all outside access
– EC2, EBS, and S3 storage are compatible with encryption
• AWS HIPAA whitepaper
–
http://d0.awsstatic.com/whitepapers/compliance/AWS_HI
PAA_Compliance_Whitepaper.pdf

Preprocessed INDI Data in the Cloud
http://preprocessed-connectomes-project.github.io/
• Available through S3
Bucket generously
provided by AWS
• Raw INDI will be available
soon

- HCP Data available in the cloud:
- https://wiki.humanconnectome.org/display/PublicData/Home
- Receive $100 AWS Credits at the HCP workshop in Hawaii
- http://humanconnectome.org/course-registration/2015/exploring-the-human-
connectome.php

Acknowledgements
CPAC Team: Daniel Clark, Steven Giavasis and Michael Milham.
NDAR “Cloud Team”: Christian Haselgrove, Dave Kennedy, and Jack van
Horn.
NDAR Team: Dan Hall, Brian Koser, David Obenshain, Svetlana Novikova,
and Malcom Jackson.
CPAC-NDAR integration was funded by a contract from NDAR.
ABIDE Preprocessed data is hosted in a Public S3 Bucket provided
by AWS.

CPAC Connectome Analysis in the Cloud

In this document