OpenStack at SJTU
Predictive Data Mining in Clinical Medicine with Dynamical HPC
SHUQUAN HUANG, 99CLOUD <HUANG.SHUQUAN@99CLOUD.NET>
DR LUO XUAN, SHANGHAI JIAO TONG UNIVERSITY <XLUO@SJTU.EDU.CN>
DR YIH LEONG SUN, INTEL CORPORATION <YIH.LEONG.SUN@INTEL.COM>
Agenda
➡ INTRODUCTION
➡ BACKGROUND
➡ OPENSTACK & OPENHPC
➡ FUTURE WORK
➡ Q&A
Introduction
➡ SHUQUAN HUANG
• Technical Director, 99Cloud
• Focus on helping enterprise land OpenStack cloud and migrate the data & applications to cloud
environment
• Joint OpenStack since 2011 as ATC, Speaker, Track Chair, etc.
➡ DR. LUO XUAN
• Senior Engineer, Shanghai Jiao Tong University
• BS, MS and Ph.D degree from SJTU
• Focus Area: Cloud computing, software defined network, data analysis
➡ DR. YIH LEONG SUN
• Senior Software Cloud Architect, Intel Open Source Technology Center
• PhD Multi-Cloud Orchestration
• >17 years experience in software development and datacenter operation
Background
➡ PROBLEM STATEMENT
• Hardware are distributed with low utilization
• High failure rate and energy consumption
• SLA is not guaranteed
➡ MISSION OF SJTU CLOUD SERVICES
• Centralize the hardware and improve the overall utilization
• Provide stable services for IT hosting within SJTU
• Serve complicated application in scientific research areas, such as
clinical medicine, Genome sequence analysis, etc.
• Integrate the data across traditional silos
Predictive Data Mining in Clinical Medicine with
Dynamical HPC
➡ HPC @ SJTU
• 400 nodes with Centos6.5, # of user increased by 89% since 2015
• Luster with RDMA, over 4PB Storage
• CPU/GPU utilization: over 60%
• Infiniband with ofed driver
• Slurm
➡ TRANSLATIONAL MEDICINE RESEARCH
• Genome sequence analysis, splicing and mutation identification
• Computational chemistry and drug design
• Knowledge base search, text search and mining, and database comparison
• Clinical information, electronic medical records, image data, gene expression analysis
Big Data
9%
Nature
Sciences
18%
Life Sciences
27%
Physics and
Astronomy
37%
Others
9%
STORAGE USAGE OF DIFFERENT AREA
HPC Requirements
➡ MULTI-TENANCY & SELF-SERVICES
➡ MUCH MORE EASY & FLEXIBLE TO MANAGE HPC INFRASTRUCTURE
➡ THE OVERALL UTILIZATION IMPROVEMENT
➡ HPC WORKLOAD MANAGEMENT
➡ HIGH THROUGHPUT SUPPORT
What is OpenHPC?
➡ MISSION
• OpenHPC is a Linux Foundation Collaborative Project whose mission is to provide a reference
collection of open-source HPC software components and best practices, lowering barriers to
deployment, advancement, and use of modern HPC methods and tools.
➡ VISION
• OpenHPC components and best practices will enable and accelerate innovation and discoveries
by broadening access to state-of-the-art, open-source HPC methods and tools in a consistent
environment, supported by a collaborative, worldwide community of HPC users, developers,
researchers, administrators, and vendors.
Who contributed to OpenHPC?
OpenHPC Software Stack
Func. Areas Components
Base OS Centos 7.3, SLEC12 SP2
Architecture x86_64, aarch64 (Tech Preview)
Admin Tools Conman, Ganglia, Lmod, LosF,
Nagios, pdsh, pdsh-mod-slurm,
prun, EasyBuild, ClusterShell, mrsh,
Genders, Shine, Spack
Provisioning Warewulf, xCAT
Res Mgmt SLURM, Munge, PBS Prof
Runtimes OpenMP, OCS, Singularity
I/O services Lustre, BeeGFS
Func. Areas Components
Numerical /
Scientific Lib
Boost, GSL, FFTW, Metis, PETSc,
Trilinos, Hypre, SuperLU, Mumps,
OpenBLAS, Scalapack
I/O Lib HDF5 (pHDF5), NetCDF, Adios
Compiler GNU (gcc, g++, gfortran)
MPI MVAPICH2, OpenMPI, MPICH
Dev Tools Autotools (autoconf, automake,
libtool), ValgrindR, SciPy, hwloc
Perf Tools PAPI, IMB, mpiP pdtoolkit TAU,
Scalasca, ScoreP, SIONLib
https://github.com/openhpc/submissions
Build an OpenHPC Recipe
+
HPC as a Service
Controller
HPC Head Node
HPC Compute Node
HPC Compute Node
HPC Compute Node
High Speed Fabric / HPC Data Interface
BMC interface
Provisioning interface
External Network
HPC Cloud Burst
Controller
HPC Compute Node
HPC Compute Node
HPC Compute Node
High Speed Fabric / HPC Data Interface
BMC interface
Provisioning interface
External Network
HPC Head Node
HPC Compute Node
HPC Compute Node
HPC Compute Node
HPC as a Service
HPC Cloud Toolkit
https://github.com/hpc-cloud-toolkit/ostack-hpc➡ RELEASE 0.7
• Tested OS: Centos 7.3
• OpenStack version: OpenStack Ocata
• OpenHPC version: 1.3.1
➡ MAILING LIST
• hpc-cloud-toolkit-users (https://groups.io/g/hpc-cloud-toolkit-users)
• for general questions, discussions, a suggestion to the project by users.
• Hpc-cloud-toolkit-devel (https://groups.io/g/hpc-cloud-toolkit-devel)
• for developers contributing the hpc-cloud-toolkit.
THANK YOU.
Questions?

OpenStack at SJTU: Predictive Data Mining in Clinical Medicine with Dynamical HPC

  • 1.
    OpenStack at SJTU PredictiveData Mining in Clinical Medicine with Dynamical HPC SHUQUAN HUANG, 99CLOUD <HUANG.SHUQUAN@99CLOUD.NET> DR LUO XUAN, SHANGHAI JIAO TONG UNIVERSITY <XLUO@SJTU.EDU.CN> DR YIH LEONG SUN, INTEL CORPORATION <YIH.LEONG.SUN@INTEL.COM>
  • 2.
    Agenda ➡ INTRODUCTION ➡ BACKGROUND ➡OPENSTACK & OPENHPC ➡ FUTURE WORK ➡ Q&A
  • 3.
    Introduction ➡ SHUQUAN HUANG •Technical Director, 99Cloud • Focus on helping enterprise land OpenStack cloud and migrate the data & applications to cloud environment • Joint OpenStack since 2011 as ATC, Speaker, Track Chair, etc. ➡ DR. LUO XUAN • Senior Engineer, Shanghai Jiao Tong University • BS, MS and Ph.D degree from SJTU • Focus Area: Cloud computing, software defined network, data analysis ➡ DR. YIH LEONG SUN • Senior Software Cloud Architect, Intel Open Source Technology Center • PhD Multi-Cloud Orchestration • >17 years experience in software development and datacenter operation
  • 4.
    Background ➡ PROBLEM STATEMENT •Hardware are distributed with low utilization • High failure rate and energy consumption • SLA is not guaranteed ➡ MISSION OF SJTU CLOUD SERVICES • Centralize the hardware and improve the overall utilization • Provide stable services for IT hosting within SJTU • Serve complicated application in scientific research areas, such as clinical medicine, Genome sequence analysis, etc. • Integrate the data across traditional silos
  • 5.
    Predictive Data Miningin Clinical Medicine with Dynamical HPC ➡ HPC @ SJTU • 400 nodes with Centos6.5, # of user increased by 89% since 2015 • Luster with RDMA, over 4PB Storage • CPU/GPU utilization: over 60% • Infiniband with ofed driver • Slurm ➡ TRANSLATIONAL MEDICINE RESEARCH • Genome sequence analysis, splicing and mutation identification • Computational chemistry and drug design • Knowledge base search, text search and mining, and database comparison • Clinical information, electronic medical records, image data, gene expression analysis Big Data 9% Nature Sciences 18% Life Sciences 27% Physics and Astronomy 37% Others 9% STORAGE USAGE OF DIFFERENT AREA
  • 6.
    HPC Requirements ➡ MULTI-TENANCY& SELF-SERVICES ➡ MUCH MORE EASY & FLEXIBLE TO MANAGE HPC INFRASTRUCTURE ➡ THE OVERALL UTILIZATION IMPROVEMENT ➡ HPC WORKLOAD MANAGEMENT ➡ HIGH THROUGHPUT SUPPORT
  • 7.
    What is OpenHPC? ➡MISSION • OpenHPC is a Linux Foundation Collaborative Project whose mission is to provide a reference collection of open-source HPC software components and best practices, lowering barriers to deployment, advancement, and use of modern HPC methods and tools. ➡ VISION • OpenHPC components and best practices will enable and accelerate innovation and discoveries by broadening access to state-of-the-art, open-source HPC methods and tools in a consistent environment, supported by a collaborative, worldwide community of HPC users, developers, researchers, administrators, and vendors.
  • 8.
  • 9.
    OpenHPC Software Stack Func.Areas Components Base OS Centos 7.3, SLEC12 SP2 Architecture x86_64, aarch64 (Tech Preview) Admin Tools Conman, Ganglia, Lmod, LosF, Nagios, pdsh, pdsh-mod-slurm, prun, EasyBuild, ClusterShell, mrsh, Genders, Shine, Spack Provisioning Warewulf, xCAT Res Mgmt SLURM, Munge, PBS Prof Runtimes OpenMP, OCS, Singularity I/O services Lustre, BeeGFS Func. Areas Components Numerical / Scientific Lib Boost, GSL, FFTW, Metis, PETSc, Trilinos, Hypre, SuperLU, Mumps, OpenBLAS, Scalapack I/O Lib HDF5 (pHDF5), NetCDF, Adios Compiler GNU (gcc, g++, gfortran) MPI MVAPICH2, OpenMPI, MPICH Dev Tools Autotools (autoconf, automake, libtool), ValgrindR, SciPy, hwloc Perf Tools PAPI, IMB, mpiP pdtoolkit TAU, Scalasca, ScoreP, SIONLib https://github.com/openhpc/submissions
  • 10.
  • 11.
  • 12.
    HPC as aService Controller HPC Head Node HPC Compute Node HPC Compute Node HPC Compute Node High Speed Fabric / HPC Data Interface BMC interface Provisioning interface External Network
  • 13.
    HPC Cloud Burst Controller HPCCompute Node HPC Compute Node HPC Compute Node High Speed Fabric / HPC Data Interface BMC interface Provisioning interface External Network HPC Head Node HPC Compute Node HPC Compute Node HPC Compute Node
  • 14.
    HPC as aService
  • 15.
    HPC Cloud Toolkit https://github.com/hpc-cloud-toolkit/ostack-hpc➡RELEASE 0.7 • Tested OS: Centos 7.3 • OpenStack version: OpenStack Ocata • OpenHPC version: 1.3.1 ➡ MAILING LIST • hpc-cloud-toolkit-users (https://groups.io/g/hpc-cloud-toolkit-users) • for general questions, discussions, a suggestion to the project by users. • Hpc-cloud-toolkit-devel (https://groups.io/g/hpc-cloud-toolkit-devel) • for developers contributing the hpc-cloud-toolkit.
  • 16.