Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Deep Learning Resources at
MDC
Alf Wachsmann
Feb 13th, 2019
Agenda
1. Available hardware suitable for DL
2. How to get access
3. How to use it
Available Hardware
Nvidia DGX-1 (maxg01)
- 8x Nvidia Tesla V100
- 512 GB 2,133 MHz DDR4 RDIMM
- 4x 1.92 TB SSD RAID 0 (7 T...
Available Hardware
HPE Proliant DL380 Gen10 (maxg03, maxg04)
- 3x Nvidia Tesla V100 (maxg04)
- 2x Nvidia Tesla V100 (maxg0...
Available Hardware
Nvidia Tesla V100:
- 640 Tensor Cores
- 5120 CUDA Cores
- Double-Precision 7 teraFLOPS
- Single-Precisi...
How to get access
- All resources are connected to the Max Cluster
- Should be accessed via the batch system
- Documentati...
Example: stardist
Example from Deep Learning Club (Dec 3rd, 2018):
Uwe Schmidt and Martin Weigert from MPI-CBG
"Deep learn...
How to use it: Containers! https://ngc.nvidia.com/catalog/containers
Nvidia provides
collection of
GPU optimzed
containers...
Using containers on Max Cluster
Create our own Singularity container from the TensorFlow Docker container with the stardis...
Using containers on Max Cluster
- Showed you interactive use.
- Submitting to batch system works just as well. Please cons...
Trends
DL is now big business. More specialized hardware for large scale inference will appear.
Cloud providers will alway...
Alf Wachsmann -Deep Learning Resources at MDC
Alf Wachsmann -Deep Learning Resources at MDC
Alf Wachsmann -Deep Learning Resources at MDC
Alf Wachsmann -Deep Learning Resources at MDC
Upcoming SlideShare
Loading in …5
×

Alf Wachsmann -Deep Learning Resources at MDC

156 views

Published on

2019-02

Published in: Education
  • Be the first to comment

  • Be the first to like this

Alf Wachsmann -Deep Learning Resources at MDC

  1. 1. Deep Learning Resources at MDC Alf Wachsmann Feb 13th, 2019
  2. 2. Agenda 1. Available hardware suitable for DL 2. How to get access 3. How to use it
  3. 3. Available Hardware Nvidia DGX-1 (maxg01) - 8x Nvidia Tesla V100 - 512 GB 2,133 MHz DDR4 RDIMM - 4x 1.92 TB SSD RAID 0 (7 TB usable) - 2x Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores) - 1x 1 Gb/s Ethernet (will go up to 10 Gb/s soon) Temporary loaner hardware from DDN: - 4x Mellanox EDR IB ConnectX-4 - Lustre access to AI200 storage via RDMA from inside docker container(s) Mellanox SB7800 InfiniBand EDR 100Gb/s Switch DDN AI200 (pre-production loaner from DDN) - 4 x EDR InfiniBand - 21 x 2.5” dual port 1.92 TB NVMe SSDs (26 TB usable)
  4. 4. Available Hardware HPE Proliant DL380 Gen10 (maxg03, maxg04) - 3x Nvidia Tesla V100 (maxg04) - 2x Nvidia Tesla V100 (maxg03) - 192 GB 2,133 MHz DDR4 RDIMM - 2x Intel(R) Xeon(R) Gold 6134 CPU v5 @ 3.20GHz (16 cores total) - 2x 10 Gb/s Ethernet AG Daumke (maxg02): - 4x Nvidia Pascal TITAN Xp - 92 GB 2,133 MHz DDR4 RDIMM - 2x Intel(R) Xeon(R) Silver 4110 CPU v5 @ 2.10GHz (16 cores total) - 1x 10 Gb/s Ethernet
  5. 5. Available Hardware Nvidia Tesla V100: - 640 Tensor Cores - 5120 CUDA Cores - Double-Precision 7 teraFLOPS - Single-Precision 14 teraFLOPS - Deep Learning 112 teraFLOPS - Interconnect Bandwidth: 32 GB/s - Memory: 16 GB HBM2 - Max Power Consumption: 250 W - Data center quality Nvidia Pascal Titan Xp: - 3840 CUDA Cores - Memory: 12 GB GDDR5X - Max Power Consumption: 250 W - Consumer gaming quality All data and pictures are from the Nvidia web site
  6. 6. How to get access - All resources are connected to the Max Cluster - Should be accessed via the batch system - Documentation: https://nagios.mdc-berlin.net/prod/wiki/doku.php?id=public:manuals:hpc:intro-en: usage#getting_access_to_the_gpu_compute_nodes
  7. 7. Example: stardist Example from Deep Learning Club (Dec 3rd, 2018): Uwe Schmidt and Martin Weigert from MPI-CBG "Deep learning based image restoration and cell segmentation for fluorescence microscopy“ Read more about their work and methods: https://github.com/mpicbg-csbd/stardist Use containers as an easy solution for trying out software. Read our (short) documentation about containers on Max Cluster: https://nagios.mdc-berlin.net/prod/wiki/doku.php?id=public:manuals:hpc:user-guide:05-containers
  8. 8. How to use it: Containers! https://ngc.nvidia.com/catalog/containers Nvidia provides collection of GPU optimzed containers with all necessary software built into them
  9. 9. Using containers on Max Cluster Create our own Singularity container from the TensorFlow Docker container with the stardist software in it. N.B.: Needs sudo/root, i.e. use your own computer to build to container. $ cat stardist.singularity Bootstrap: docker From: nvcr.io/nvidia/tensorflow:18.11-py3 %post apt-get -y update && apt-get -y install firefox pip install jupyter pip install stardist mkdir /notebooks && chmod a+rwx /notebooks %runscript jupyter notebook --notebook-dir=/notebooks --ip 0.0.0.0 --allow-root $ sudo singularity build /tmp/stardist.sif stardist.singularity # Image will run on CentOS, Ubuntu, etc. download https://github.com/mpicbg-csbd/stardist to /home/awachs/Software/stardist/ $ singularity run --nv -B /home/awachs/Software/stardist/examples:/notebooks -B /tmp:/run /tmp/stardist.sif
  10. 10. Using containers on Max Cluster - Showed you interactive use. - Submitting to batch system works just as well. Please consult our documentation. - Important command to know about:
  11. 11. Trends DL is now big business. More specialized hardware for large scale inference will appear. Cloud providers will always offer the latest HW. Intel: - Xeon v6 (“Cascade Lake ”) - Vector Neural Network Instructions (AVX-512_VNNI) to speed up inference Nvidia: - T4 Tensor Core GPU for AI Inference: - Turing Tensor Cores 320 - NVIDIA CUDA® cores 2,560 - Max Power Consumption: 70 W - Pre-Trained Networks (the ones below are for Medicine): - NVIDIA Transfer Learning Toolkit: https://developer.nvidia.com/transfer-learning-toolkit - NVIDIA AI-Assisted Annotation SDK: https://developer.nvidia.com/clara/annotation Google: - Pre-trained variant calling network: https://github.com/google/deepvariant

×