Scaling systems for research computing

Scaling Systems for Research Computing
adam@bioteam.net
!1

!2
The ‘Meta’ Issue
What is driving all of this?
Scalable Infrastructure
Scalable Software
Compliance
Intro to BioTeam
Who, What, Why
Q&A
1
2
3
4
5
6

Who, What, Why ...
!3
BioTeam
‣ Independent consulting shop
‣ Staﬀed by scientists forced to
learn IT, SW & HPC to get our
own research done
‣ 10+ years bridging the “gap”
between science, IT & high
performance computing
‣ Our wide-ranging work is what
gets us invited to speak at
events like this ...

!4
Bioinformatics and Big Iron

Culture
BioTeam
‣ We are a distributed company
• BioTeam is 100% REMOTE
• All employees are MANAGERS
• Workﬂow is mostly ASYNCHRONOUS
‣ Prefer small interdisciplinary TEAMS
• Value placed on TRUST and PERFORMANCE
!5

Today
BioTeam
‣ 10 full-time employees in 2014
• 2 dedicated to HPC Infrastructure
• 2 dedicated to Software Development
• 1 dedicated to Products
• 1 dedicated to Government Services
• 1 dedicated to Cloud Computing
‣ 10+ years supporting Life Sciences Research
!6

!8
Science is changing faster than
IT infrastructure

Amazon vs. Other Clouds
‣ AWS has by far the most useful IaaS building
blocks today
• First choice for most Bio-IT use cases
‣ AWS quietly rolls out killer features
• Spot Market
• Virtual Private Cloud
‣ Provider decision may be based on where your
data actually resides
!10

11
Real world simulation project

Massive resources and API’s galore
Google
‣ Google started with PaaS and worked down
‣ Google Exacycle for Visiting Faculty (closed)
• 1 billion core hours on demand; what’s next?
‣ Google is DEVELOPER centric; everything has
an API
‣ Culture is based on Science and Engineering
!12

Devops
Configuration Management
‣ Required in almost every cloud project
‣ Chef/Puppet/Ansible/Fabric
• Domain specific languages; Agent-based versus SSH; Abstraction
‣ Key is reducing institutionalized knowledge and sharing
recipes
‣ Docker/lxc could be disrupting
• Lightweight differential images; not very HPC friendly at this point
‣ Orchestration tools lagging behind provisioning and
configuration
‣ Best techniques are making their way back into HPC
!14

open-source cluster computing toolkit
MIT StarCluster
‣ Ideal for most HPC use cases
• Includes Grid Engine, NFS, and MPI
• NEW Support for Virtual Private Cloud!
‣ Works with Spot Instances
‣ Extensible via plugins
• Hadoop
• HTCondor
• GlusterFS
• IPython Notebook
!16

Where is your datacenter?
Private Cloud
!18

Google Datacenters
Public Cloud
!20

In modern processors and coprocessors
Types of Parallelism
!22
Instruction
Level
Vector
Level
Thread
Level
Node
Level
Micro-architectural techniques such as pipelined execution,
out-of/in-order execution, super-scalar execution, branch
prediction…
Using SIMD vector processing instructions for SSE, AVX,
Phi
Multi-core architectures with or without Hyper-Threading
Many-core architecture with smart round robin hardware
multithreading
Distributed Computing
Cluster Computing

Fully functional multi-thread execution unit
Intel Xeon Phi Coprocessor
‣ 50+ cores with a ring interconnect
‣ 64-bit addressing
‣ Scalar unit based on Intel Pentium family
‣ Vector unit 512-bit SIMD Instructions
‣ 4 hardware threads per core
‣ Highly Parallel device
‣ SMP on-a-chip
!23

Choices
Programming Xeon Phi
!24
Oﬄoaded Native
‣ Pragma/directives based
‣ Better serial processing
‣ More memory
‣ Better ﬁle access
‣ Makes full use of available resources
‣ Simpler programming model
‣ Quicker to test key kernels
‣ Some constraints
‣ Memory availability
‣ File I/O access

Mapping with Burrows-Wheeler Aligner (BWA)
Intel Optimization Example
!25
0
0
1
1
2
1.86
1.24
1
Xeon (baseline)
Xeon (optimized)
Xeon + Phi‣ Replace pthreads with OpenMP
‣ Better load balancing
‣ Overlap I/O and Compute
‣ Better thread usage
‣ Eﬃcient memory allocation
‣ Vectorized performance critical
loops
‣ Data prefetch to reduce memory
latency
Source: Life Sciences Optimization - Intel - SC13

Protein sequence analysis with MPI-HMMER
!26
0
0
1
1
2
1.56
1
Xeon
Xeon + Phi
‣ No source code changes required
‣ Use #pragma unroll to improve
loop performance
‣ Double nested loop in Viterbi
algorithm is auto-vectorized for
Xeon and Phi by Intel compilers
Source: Life Sciences Optimization - Intel - SC13

Assembly with Velour
‣ Intel and UIUC released open-source
alternative to velveth
‣ > 10x reduction in memory usage
• Intelligently caching portions of assembly to disk
• 700GB to 60GB
‣ https://github.com/jjcook/velour
‣ Cook, Jeﬀrey J. 2011. Scaling short read de novo
DNA sequence assembly to gigabase genomes.
!27

Recommendations
Programming Xeon Phi
‣ Host can have multiple Phi cards
‣ MLK libraries are pre-optimized
‣ OpenMP is applicable to multi-core and many-
core programming
• omp ofﬂoad target(mic)
‣ MPI supports distributed computation and
combines with other models
• OpenMP within nodes and MPI between nodes
‣ Xeon optimizations translate well to Phi
!28

In the Life Sciences
Parallel Programming
‣ Targets: CPU, Coprocessors, GPU, FGPA, ASIC
‣ There is no silver bullet
‣ Problem decomposition is the most critical step
‣ Think in parallel
‣ Using Intel compilers can yield ~30% speedup
in many cases
• vtune and other analysis tools are available
‣ Must optimize at one or more levels
!29

Recommendations
‣ Leaving performance on the table
• Low hanging fruit; splitting input ﬁles into parts
• Avoid using languages with poor concurrency model and GIL
‣ Exploit thread-level parallelism
• Use multi-threading and multi-processing to fully utilize multicore
processors
‣ Use Intel’s Auto-Vectorizing compiler
• Take advantage of SIMD parallelism and wider vectors on Phi
‣ Prepare for a heterogenous many-core future
• Hybrid Programming (OpenMP + MPI)
!31

Platforms
‣ Intel Distribution for Apache Hadoop
• Enhances open-source Hadoop on Xeon processors
• More efﬁcient; faster startup times
• Management tools
‣ Intel Enterprise Edition Lustre
• Enhances open-source Lustre
• REST API
• Hadoop Adapter
!32

A fresh approach to technical computing
I <3 Julia
‣ Homoiconic; Dynamic type
system
‣ Designed for parallelism and
distributed computation
‣ MATLAB-like syntax and
extensive math library
‣ Call C functions directly
‣ Call Python functions
‣ IJulia Notebook
‣ Open Source
!33

Overview
Compliance
‣ Need a compliance apparatus
‣ Often a barrier to competition
‣ Compute and Storage are easy
• Policy and procedures are harder
‣ AWS and Google will now sign BAA
!35

Strategy
Compliance
‣ Keys are protecting data and preventing access
‣ Data management - points of control
‣ Encrypt data in ﬂight and at rest
• Use S3 server-side encryption
• Google Persistent Disks are automatically encrypted
‣ Use credential rotation policies
‣ Lock down security groups and ﬁrewalls
‣ Use VPN for all public connections
‣ Log everything and audit often
!36

!38
ACK
!
!
http://bioteam.net
http://psc.edu
http://software.intel.com/en-us/mic-developer
http://julialang.org

Scaling systems for research computing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scaling systems for research computing

Similar to Scaling systems for research computing (20)

Recently uploaded

Recently uploaded (20)

Scaling systems for research computing