Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BSC LMS DDL

573 views

Published on

Large Model Support and Distributed Deep Learning features

Published in: Technology
  • Best dissertation help you can get, thank god a friend suggested me ⇒⇒⇒WRITE-MY-PAPER.net ⇐⇐⇐ otherwise I could have never completed my dissertation on time.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! I can recommend a site that has helped me. It's called ⇒ www.HelpWriting.net ⇐ So make sure to check it out!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

BSC LMS DDL

  1. 1. © 2018 IBM Corporation IBM Cognitive Systems IBM OpenPOWER foundation Barcelona Supercomputing Center Power9 AI differentiators Ander Ochoa – ander.ochoa.gilo@ibm.com Cognitive Systems Technical Architect for SPGI https://es.linkedin.com/in/anderotxoa @AnderOtxoa IBM 2018
  2. 2. © 2018 IBM Corporation IBM Cognitive Systems 2 Agenda • Large Memory Support (LMS) • Distributed Deep Learning (DDL) • PowerAI Vision
  3. 3. © 2018 IBM Corporation IBM Cognitive Systems 3 Large Memory Support (LMS) Objective: Overcome GPU Memory Limitations in DL Training. Increase the Batch Size and/or increase the resolution of the features. LMS enables processing of high definition images, large models, and higher batch sizes that doesn’t fit in GPU memory today (Maximum GPU memory available in Nvidia P100 and V100 GPUs is 16/32GB). Available for - Caffe - TensorFlow - Chainer https://www.sysml.cc/doc/127.pdf GPU RAM System RAM NVLinkv2.0 2 TB 16/32 GB Accelerated by NVLink Dataset
  4. 4. © 2018 IBM Corporation IBM Cognitive Systems LMS advantages running in Power architecture • LMS DL workload in Power vs LMS DL workload in x86 • NVLink provides up to 380% (depends on every workload/dataset) • NO LMS DL workload in Power vs NO LMS DL workload in x86 • NVLink provides 30% advantage (depends on every workload/dataset) NVMe Storage GPU RAM System RAM NVMeover PCIeV4 @32-140GB/sN TB 2 TB 16/32 GB SAS Storage GPU RAM System RAM SASover PCIeV3 @12Gb/s PCIeV3 @16GB/s N TB 1 TB 16/32 GB GPU GPU NVLinkv2.0 @150GB/s HBM2 HBM2 Power9x86 DDR4 DDR4
  5. 5. © 2018 IBM Corporation IBM Cognitive Systems 5 AC922 System buses and components diagram 32 -140+GB/s 64GB/s
  6. 6. © 2018 IBM Corporation IBM Cognitive Systems https://www.linkedin.com/pulse/deep-learning-high-resolution-images-large-models-sumit-gupta/ This comparison is between a AC922 server with four NVIDIA Tesla V100 GPUs versus a server with two Intel Xeon 2640v4 CPUs with four NVIDIA Tesla V100 GPUs
  7. 7. © 2018 IBM Corporation IBM Cognitive Systems 7 LMS in Caffe $caffe train -solver solver.prototxt -bvlc -gpu 0,1,2,3 -lms 10000 -lms_frac 0.5 • -lms 10000. Any memory chunk allocation larger than 10000KB will be done in CPU memory, and fetched to GPU memory only when needed for computation. • -lms_frac 0.5. LMS doesn’t kick in until more than at least 50% of GPU memory is expected to be utilized. Note that configuring the “lms” and “lms_frac” values depends on the below factors: • Batch size used • Model used • Number of GPUs used • System memory available Arriving at an optimal configuration requires understanding of the above and experimentation based on that. A general guideline is that the optimal configuration should utilize GPU memory close to fullest.
  8. 8. © 2018 IBM Corporation IBM Cognitive Systems 8 Demo https://developer.ibm.com/linuxonpower/2017/09/22/realizing-value-large-model-support-lms-powerai-ibm-caffe/ ssh bsc18651@plogin2.bsc.es srun -N 1 --exclusive --gres="gpu:4" --exclusive --pty $SHELL #Set the cpu to performance mode lscpu; ppc64_cpu –smt; ppc64_cpu --smt=2 cpupower -c all frequency-set -g performance #check gpu status nvidia-smi ; nvidia-smi -i 0 –q; nvidia-smi -ac 877,1530 #Activate caffe cd /gpfs/scratch/bsc18/bsc18040/lms source /opt/DL/caffe/bin/caffe-activate #show solver cat solver.prototxt #check in model batchsize = 1 vi models/googlenet_big.prototxt caffe train -solver solver.prototxt -bvlc -gpu 0,1,2,3 #change model batchsize = 5 #will give error (out of memory) vi models/googlenet_big.prototxt caffe train -solver solver.prototxt -bvlc -gpu 0,1,2,3 #check in model batchsize = 5 vi models/googlenet_big.prototxt caffe train -solver solver.prototxt -bvlc -gpu 0,1,2,3 -lms 10000 -lms_frac 0.5
  9. 9. © 2018 IBM Corporation IBM Cognitive Systems 9 Agenda • Large Memory Support (LMS) • Distributed Deep Learning (DDL) • PowerAI Vision
  10. 10. © 2018 IBM Corporation IBM Cognitive Systems 10
  11. 11. © 2018 IBM Corporation IBM Cognitive Systems 11 Distributed Deep Learning Objective: Overcome the server boundaries of some DL frameworks. How: Scaling. Using “ddlrun” applied to Topology aware distributed frameworks. Our software does deep learning training fully synchronously with very low communication overhead. The overall goal of ddlrun is to improve the user experience DDL users. To this end the primary features of ddlrun are: • Error Checking/Configuration Verification • Automatic Rankfile generation • Automatic mpirun option handling Available for: • Tensorflow • IBM Caffe • Torch How to code Topology Aware Distributed Models: • https://netweblog.wordpress.com/2018/04/10/distributed-tensorflow-sample-code-and-how-it-works/ • https://arxiv.org/pdf/1704.04560.pdf https://www.sysml.cc/doc/127.pdf Good for: • Speed • Accuracy
  12. 12. © 2018 IBM Corporation IBM Cognitive Systems Distributed Deep Learning (DDL) for Training phase Using the Power of 100s of Servers August 8, 2017 16 Days Down to 7 Hours: Near Ideal Scaling to 256 GPUs and Beyond 1 System 64 Systems 16 Days 7 Hours ResNet-101, ImageNet-22K, Caffe with PowerAI DDL, Running on Minsky (S822Lc) Power System 58x Faster https://www.ibm.com/blogs/research/2017/08/distributed-deep-learning/
  13. 13. © 2018 IBM Corporation IBM Cognitive Systems Demo 13 ssh bsc18651@plogin2.bsc.es #Slurm login node # You should have a ~/data dir with the dataset downloaded or internet conection to download it #Edit and include the following line in ~/.bashrc export TMPDIR=/tmp/ # To pass all the variables, like activate ...., you may need to write a simple submission script: Run as “sbatch script.sh” [bsc18651@p9login2 ~]$ cat script.sh #!/bin/bash #SBATCH -J test #SBATCH -D . #SBATCH -o test_%j.out #SBATCH -e test_%j.err #SBATCH -N 2 #SBATCH --ntasks-per-node=4 #SBATCH --gres="gpu:4" #SBATCH --time=01:00:00 module purge module load anaconda2 powerAI source /opt/DL/ddl-tensorflow/bin/ddl-tensorflow-activate export TMPDIR="/tmp/" export DDL_OPTIONS="-mode b:4x2" NODE_LIST=$(scontrol show hostname $SLURM_JOB_NODELIST | tr 'n' ',') NODE_LIST=${NODE_LIST%?} cd examples/mnist ddlrun -n 8 -H $NODE_LIST python mnist-init.py --ddl_options="-mode b:4x2" --data_dir /home/bsc18/bsc18651/examples/mnist/data [bsc18651@p9login2 ~]$ sbatch script.sh https://developer.ibm.com/linuxonpower/2018/05/01/improved-ease-use-ddl-powerai/
  14. 14. © 2018 IBM Corporation IBM Cognitive Systems 14 Agenda • Large Memory Support (LMS) • Distributed Deep Learning (DDL) • PowerAI Vision
  15. 15. © 2018 IBM Corporation IBM Cognitive Systems PowerAI Vision v1.1 15 https://www-01.ibm.com/common/ssi/cgi- bin/ssialias?infotype=an&subtype=ca&appname=gpateam&supplier=877&letternum=ENUSZP18-0143 PowerAI Vision V1.1 can help provide robust end-to-end workflow support for deep learning models related to computer vision. This enterprise-grade software provides a complete ecosystem to label raw data sets for training, creating, and deploying deep learning-based models. PowerAI Vision is designed to empower subject matter experts with no skills in deep learning technologies to train models for AI applications. It can help train highly accurate models to classify images and detect objects in images and videos. PowerAI Vision is built on open source frameworks • User interface-driven interaction to configure and manage lifecycles of data sets and models • A differentiated capability where trained deep learning models automatically detect objects from videos • Preconfigured deep learning models specialized to classify and detect objects • Preconfigured hyper-parameters optimized to classify and detect objects • Training visualization and runtime monitoring of accuracy • Integrated inference service to deploy models in production • Scalable architecture designed to run deep learning, high-performance analytics, and other long-running services and frameworks on shared resources
  16. 16. © 2018 IBM Corporation IBM Cognitive Systems PowerAI Vision demo 16 http://9.172.154.29:9080/powerai-vision/index.html
  17. 17. © 2018 IBM Corporation IBM Cognitive Systems Artificial Intelligence Power8 System Available NOW! ü To Facilitate and Expedite PoCs, Demos, Workshops… ü To offer to clients/BPs/partners so they can test it by themselves ü To show it to the world J Located in the IBM Client Center (Madrid) IBM “9” network Accesible from: (on demand) Internet Tech specs: - S822LC Power8 server (20 P8 cores) - 2x Nvidia P100 GPUs (7168 cuda cores) - 2x 500GB SDD + 1.6 TB NVMe Storage - 256 GB RAM - Ubuntu 16.04 - PowerAI v1.4 + VisionAI TP4
  18. 18. © 2018 IBM Corporation IBM Cognitive Systems IBM ICP + IBM DSX + IBM AI. Available NOW! ü To Facilitate and Expedite PoCs, Demos, Workshops… ü To offer to clients/BPs/partners so they can test it by themselves ü To show it to the world J Located in the IBM TEC (Madrid) IBM “9” network Accesible from: (for the time being) Tech specs: - S822L Power8 server (10 P8 cores) - 2 LPARs - 2x 500GB HDD - 512GB RAM - Ubuntu 16.04 - IBM Cloud Private - IBM Data Science Experience - S822L Power8 server (10 P8 cores) - 2x Nvidia P100 GPUs (7168 cuda cores) - 2x 500GB SDD + 1.6 TB NVMe Storage - 256 GB RAM - Ubuntu 16.04 - PowerAI tools ICP https://9.172.229.247:8443/console/ DSX https://9.172.229.247:31843
  19. 19. © 2018 IBM Corporation IBM Cognitive Systems THANK YOU!
  20. 20. © 2018 IBM Corporation IBM Cognitive Systems Notice and disclaimers ü Copyright © 2017 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. ü U.S. Government Users Restricted Rights — use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. ü Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided. ü IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” ü Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. ü Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. ü References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. ü Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. ü It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.
  21. 21. © 2018 IBM Corporation IBM Cognitive Systems Notice and disclaimers continued Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a particular, purpose. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com, AIX, BigInsights, Bluemix, CICS, Easy Tier, FlashCopy, FlashSystem, GDPS, GPFS, Guardium, HyperSwap, IBM Cloud Managed Services, IBM Elastic Storage, IBM FlashCore, IBM FlashSystem, IBM MobileFirst, IBM Power Systems, IBM PureSystems, IBM Spectrum, IBM Spectrum Accelerate, IBM Spectrum Archive, IBM Spectrum Control, IBM Spectrum Protect, IBM Spectrum Scale, IBM Spectrum Storage, IBM Spectrum Virtualize, IBM Watson, IBM z Systems, IBM z13, IMS, InfoSphere, Linear Tape File System, OMEGAMON, OpenPower, Parallel Sysplex, Power, POWER, POWER4, POWER7, POWER8, Power Series, Power Systems, Power Systems Software, PowerHA, PowerLinux, PowerVM, PureApplica- tion, RACF, Real-time Compression, Redbooks, RMF, SPSS, Storwize, Symphony, SystemMirror, System Storage, Tivoli, WebSphere, XIV, z Systems, z/OS, z/VM, z/VSE, zEnterprise and zSecure are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.

×